Applied Bayesian Data Analysis

Fundamentals

David Tolpin, david.tolpin@gmail.com

Concepts

  • Generative model
  • Bayes theorem
  • Prior, conditional, posterior distribution

With borrowings from Statistical rethinking

Mainstream statistics

Decision tree

Galileo vs. Aristotle

Bayesian data analysis

  1. Design the generative model (data story)
  2. Condition on observed data (update)
  3. Evaluate the fit of the model (critique)

Model

  • Domain knowledge
  • Data collection

Joint probability for all (observable and unobservable) variables

Conditioning

  • Compute the posterior distribution
  • Interpret the inference results

Evaluating the fit

  • How well does the model fit the data?
  • Are the substantive conclusions reasonable?
  • How sensitive are results to model assumptions?

The garden of forked data

  • Count all the ways data can happen, according to assumptions.
  • Assumptions with more ways to happen are more plausible.

From "Statistical rethinking"

General notation for statistical inference

  • Conclusions about large population
  • Based on a sample from population
  • Two kinds of estimands:
    1. potentially observable quantities (future observations)
    2. unobservable quantities — latent variables, model parameters

Parameters, data, and predictions

  • $y$ — available observations
  • $\theta$ — model parameters (discovering the laws)
  • $\tilde y$ — unknown observations (predicting the future)

Observational units and variables

  • Data is a set of $n$ objects or units, $y=(y_1, ..., y_n)$
  • Each $y_i$ may be a vector
  • $y_i$ are called outcomes, and are ‘random’ variables

Exchangeability

  • Permutation of indices in $(y_1, y_2, ..., y_n)$ should not change the results
  • Otherwise, indices convey information about observations (why may be useful?)
  • Data is modelled as i.i.d. — independently identically distributed from distribution $p(\theta)$

Explanatory variables

  • $x=(x_1, ..., x_n)$ — ‘features’, ‘predictors’, observations which we do not model as random.
  • When $x$ bear enough information, the model should be exchangeable.
  • Variables may move between $x$ and $y$, depending on the problem.

Globe toss

From "Statistical rethinking"

Bayesian inference

  • Statements are made about probabilities
  • Probabilities are conditioned on observations: $p(\tilde y|y)$ or $p(\theta|y)$

Notation for Bayesian inference

  • $p(\cdot|\cdot)$ — conditional probability density
  • $p(\cdot)$ — marginal probability density
  • $\Pr(\cdot)$ — probability of an event, $\Pr(\theta > 2) = \int_{\theta>2}p(\theta)d\theta$.
  • Standard distributions have names: $\mathcal{N}(\theta|\mu, \sigma^2)$ or $\theta \sim \mathcal{N}(\theta|\mu, \sigma^2)$

Bayes’ rule

  • Model is a joint distribution of $\theta$ and $y$: $p(\theta, y)$
  • $p(\theta, y) = p(\theta)p(y|\theta)$
  • Conditional density via Bayes’ rule : $$p(\theta|y) = \frac {p(\theta, y)} {p(y)} = \frac {p(\theta)p(y|\theta)} {p(y)}$$ where $$p(y) = \int p(\theta)p(y|\theta)d\theta$$
  • We only need unnormalized posterior density: $$p(\theta|y) \propto p(\theta)p(y|\theta)$$

Predictive distribution

  • Prior predictive distribution (before seeing the data): $$p(y)= \int p(y, \theta) d\theta$$
  • Posterior predictive distribution: $$p(\tilde y | y) = \int p(\tilde y|y, \theta)p(\theta|y)d\theta = \int p(\tilde y|\theta)p(\theta|y)d\theta$$
  • $\tilde y$ is conditionally independent of $y$ given $\theta$: $p(\tilde y| \theta, y) = p(\tilde y|\theta)$

Readings

  1. Statistical rethinking — chapters 1 and 2.
  2. Bayesian Data Analysis — chapter 1.
  3. Probabilistic Models of Cognition — chapters 1, 2, and 3.

Hands-on

  • Setting up Julia, Turing, and notebook evnironments.
  • Turing tutorials.