Applied Bayesian Data Analysis

Fundamentals

David Tolpin, david.tolpin@gmail.com

Concepts

Generative model
Bayes theorem
Prior, conditional, posterior distribution

With borrowings from Statistical rethinking

Mainstream statistics

Large sample
Null hypothesis
Hypothesis testing

Decision tree

Galileo vs. Aristotle

Null hypotheis: Heavier objects fall faster
Null hypothesis: inertia of rest
Null hypothesis: Vacuum is impossible

Bayesian data analysis

Design the generative model (data story)
Condition on observed data (update)
Evaluate the fit of the model (critique)

Model

Domain knowledge
Data collection

Joint probability for all (observable and unobservable) variables

Conditioning

Compute the posterior distribution
Interpret the inference results

Evaluating the fit

How well does the model fit the data?
Are the substantive conclusions reasonable?
How sensitive are results to model assumptions?

The garden of forked data

Count all the ways data can happen, according to assumptions.
Assumptions with more ways to happen are more plausible.

From "Statistical rethinking"

General notation for statistical inference

Conclusions about large population
Based on a sample from population
Two kinds of estimands:
1. potentially observable quantities (future observations)
2. unobservable quantities — latent variables, model parameters

Parameters, data, and predictions

$y$ — available observations
$\theta$ — model parameters (discovering the laws)
$\tilde y$ — unknown observations (predicting the future)

Observational units and variables

Data is a set of $n$ objects or units, $y=(y_1, ..., y_n)$
Each $y_i$ may be a vector
$y_i$ are called outcomes, and are ‘random’ variables

Exchangeability

Permutation of indices in $(y_1, y_2, ..., y_n)$ should not change the results
Otherwise, indices convey information about observations (why may be useful?)
Data is modelled as i.i.d. — independently identically distributed from distribution $p(\theta)$

Explanatory variables

$x=(x_1, ..., x_n)$ — ‘features’, ‘predictors’, observations which we do not model as random.
When $x$ bear enough information, the model should be exchangeable.
Variables may move between $x$ and $y$, depending on the problem.

Globe toss

From "Statistical rethinking"

Bayesian inference

Statements are made about probabilities
Probabilities are conditioned on observations: $p(\tilde y|y)$ or $p(\theta|y)$

Notation for Bayesian inference

$p(\cdot|\cdot)$ — conditional probability density
$p(\cdot)$ — marginal probability density
$\Pr(\cdot)$ — probability of an event, $\Pr(\theta > 2) = \int_{\theta>2}p(\theta)d\theta$.
Standard distributions have names: $\mathcal{N}(\theta|\mu, \sigma^2)$ or $\theta \sim \mathcal{N}(\theta|\mu, \sigma^2)$

Bayes’ rule

Model is a joint distribution of $\theta$ and $y$: $p(\theta, y)$
$p(\theta, y) = p(\theta)p(y|\theta)$
Conditional density via Bayes’ rule : $$p(\theta|y) = \frac {p(\theta, y)} {p(y)} = \frac {p(\theta)p(y|\theta)} {p(y)}$$ where $$p(y) = \int p(\theta)p(y|\theta)d\theta$$
We only need unnormalized posterior density: $$p(\theta|y) \propto p(\theta)p(y|\theta)$$

Predictive distribution

Prior predictive distribution (before seeing the data): $$p(y)= \int p(y, \theta) d\theta$$
Posterior predictive distribution: $$p(\tilde y | y) = \int p(\tilde y|y, \theta)p(\theta|y)d\theta = \int p(\tilde y|\theta)p(\theta|y)d\theta$$
$\tilde y$ is conditionally independent of $y$ given $\theta$: $p(\tilde y| \theta, y) = p(\tilde y|\theta)$

Readings

Statistical rethinking — chapters 1 and 2.
Bayesian Data Analysis — chapter 1.
Probabilistic Models of Cognition — chapters 1, 2, and 3.

Hands-on

Setting up Julia, Turing, and notebook evnironments.
Turing tutorials.