Decision tree
Bayesian data analysis
- Design the generative model (data story)
- Condition on observed data (update)
- Evaluate the fit of the model (critique)
Model
- Domain knowledge
- Data collection
Joint probability for all (observable and unobservable) variables
Conditioning
- Compute the posterior distribution
- Interpret the inference results
Evaluating the fit
- How well does the model fit the data?
- Are the substantive conclusions reasonable?
- How sensitive are results to model assumptions?
The garden of forked data
- Count all the ways data can happen, according to assumptions.
- Assumptions with more ways to happen are more plausible.
From "Statistical rethinking"
General notation for statistical inference
- Conclusions about large population
- Based on a sample from population
- Two kinds of estimands:
- potentially observable quantities (future observations)
- unobservable quantities — latent variables, model parameters
Parameters, data, and predictions
- $y$ — available observations
- $\theta$ — model parameters (discovering the laws)
- $\tilde y$ — unknown observations (predicting the future)
Observational units and variables
- Data is a set of $n$ objects or units, $y=(y_1, ..., y_n)$
- Each $y_i$ may be a vector
- $y_i$ are called outcomes, and are ‘random’ variables
Exchangeability
- Permutation of indices in $(y_1, y_2, ..., y_n)$ should not change the results
- Otherwise, indices convey information about observations (why may be useful?)
- Data is modelled as i.i.d. — independently identically distributed from distribution $p(\theta)$
Explanatory variables
- $x=(x_1, ..., x_n)$ — ‘features’, ‘predictors’, observations which we do not model as random.
- When $x$ bear enough information, the model should be exchangeable.
- Variables may move between $x$ and $y$, depending on the problem.
Bayesian inference
- Statements are made about probabilities
- Probabilities are conditioned on observations: $p(\tilde y|y)$ or $p(\theta|y)$
Notation for Bayesian inference
- $p(\cdot|\cdot)$ — conditional probability density
- $p(\cdot)$ — marginal probability density
- $\Pr(\cdot)$ — probability of an event, $\Pr(\theta > 2) = \int_{\theta>2}p(\theta)d\theta$.
- Standard distributions have names: $\mathcal{N}(\theta|\mu, \sigma^2)$ or $\theta \sim \mathcal{N}(\theta|\mu, \sigma^2)$
Bayes’ rule
- Model is a joint distribution of $\theta$ and $y$: $p(\theta, y)$
- $p(\theta, y) = p(\theta)p(y|\theta)$
- Conditional density via Bayes’ rule :
$$p(\theta|y) = \frac {p(\theta, y)} {p(y)} = \frac {p(\theta)p(y|\theta)} {p(y)}$$
where
$$p(y) = \int p(\theta)p(y|\theta)d\theta$$
- We only need unnormalized posterior density:
$$p(\theta|y) \propto p(\theta)p(y|\theta)$$
Predictive distribution
- Prior predictive distribution (before seeing the data):
$$p(y)= \int p(y, \theta) d\theta$$
- Posterior predictive distribution:
$$p(\tilde y | y) = \int p(\tilde y|y, \theta)p(\theta|y)d\theta = \int p(\tilde y|\theta)p(\theta|y)d\theta$$
- $\tilde y$ is conditionally independent of $y$ given $\theta$: $p(\tilde y| \theta, y) = p(\tilde y|\theta)$
Hands-on
- Setting up Julia, Turing, and notebook evnironments.
- Turing tutorials.