Applied Bayesian Data Analysis

Priors

David Tolpin, david.tolpin@gmail.com

Concepts

  • Conjugacy
  • Informative, non-informative, semi-informative priors
  • Pivotal quantities

Conjugacy

A boring mathematical concept:

  • $\mathcal{F}$ — class of sampling distributions $p(y|\theta)$
  • $\mathcal{P}$ — class of prior distributions for $\theta$
  • $\mathcal{P}$ is conjugate for $\mathcal{F}$ if $$p(\theta|y) \in \mathcal{P}\mbox{ for all }p(\cdot|\theta) \in \mathcal{F}\mbox{ and }p(\cdot) \in \mathcal{P}$$
  • There are natural families of $\mathcal{P}$
  • Conjugate priors/posteriors are interpretable

Example: binomial model

\begin{aligned} \theta &\sim \mathrm{Prior} \\ y_{1:n}&\sim \mathrm{Bernoulli}(\theta) \end{aligned}

How to choose $\mathrm{Prior}$?

  • $p(\theta|y) \propto p(y, \theta) = p(\theta)p(y|\theta)$
  • $p(y_{1:n}|\theta) = \theta^k(1 - \theta)^{n-k}$
  • if $p(\theta) \propto \theta^a(1-\theta)^b$
    then $p(\theta|y) \propto \theta^{a + k}(1 - \theta)^{b + n - k}$ — same form

Example: binomial model

  • $\mathrm{Beta}(\theta|\alpha, \beta) = \frac 1 {\mathrm{B}(\alpha, \beta)} \theta^{\alpha-1} (1 - \theta)^{\beta - 1}$
  • $\mathrm{Beta}(\alpha, \beta)$ is the conjugate prior for $\mathrm{Bernoulli}(\theta)$
    • $\alpha$ — number of ‘prior’ successes (heads),
    • $\beta$ — number of ‘prior’ failures (tails).
  • $\alpha=\beta=1$ — uniform $[0, 1]$ prior.

Exponential families

  • $\mathcal{F}$ is an exponential family if $$p(y_i|\theta) = f(y_i)g(\theta)e^{\phi(\theta)^\top u(y_i)}$$
  • $\phi(\theta)$ — natural parameter
  • likelihood of set $y=(y_1, ..., y_n)$ is $$p(y|\theta) \propto g(\theta)^ne^{\phi(\theta)^Tt(y)}$$ where $t(y) = \sum_{i=1}^n u(y_i)$
  • $t(y)$ is a sufficient statistics for $\theta$, all we need to know about the data

Exp. family conjugates

  • If $p(\theta) \propto g(\theta)^\eta e^{\phi(\theta)^T\nu}$,
  • then $p(\theta|y) \propto g(\theta)^{\eta+n}e^{\phi(\theta)^T(\nu+t(y))}$.
  • $p(\theta|y)$ has the same form, so $p(\theta)$ is conjugate to $p(y|\theta)$.

Exp. family members

  • Bernoulli
  • Normal, $\propto \frac 1 \sigma e^{-\frac 1 {2\sigma^2} {(x-\mu)^2}}$
  • Poisson, $\propto \theta^y e^{-\theta}$
  • Exponential, $\propto \theta e^{-y\theta}$
  • ...

Specifying priors

  • Prior $p(\theta) = \int_Y p(\theta|y)p(y)dy$ is marginal of $\theta$ over all possible observations.
  • Posterior is a compromise between prior and conditional:
    • $\mathbb{E}(\theta) = \mathbb{E}(\mathbb{E}(\theta|y))$
    • $\mathbb{E}(\mathbb{var}(\theta)) = \mathbb{E}(\mathbb{var}(\theta|y)) + \mathbb{var}(\mathbb{E}(\theta|y))$
      • $\mathbb{E}(\mathbb{var}(\theta|y))$ — ‘unexplained’ variation
      • $\mathbb{var}(\mathbb{E}(\theta|y))$ — ‘explained’ variation
  • Posterior variance is on average smaller than prior
  • If posterior variance is greater, look for a problem

Informative priors

  • Prior defines the ‘population’
  • Or, prior defines the ‘state of knowledge’
  • Example: coin flip
    • 9+1 coins from the same batch
    • 5 fell on heads, 4 on tails on a single toss
    • Prior for the 10th coin: $\mathrm{Beta}(5, 4)$

Non-informative priors

  • No prior information, (almost) all distributions are possible
  • May be also used for ‘regularization’ that is, making model work
  • Examples:
    • $\mathrm{Beta}(1, 1)$ — uniform prior
    • $\mathrm{Normal}(0, 1000)$ — regularization

Non-informative priors

Pivotal quantity, location, scale

  • Location:
    • $p(y-\theta|\theta) = f(u)$, $u = y - \theta$
    • $y - \theta$ — pivotal quantity, $\theta$ — location parameter
    • $p(\theta) \propto C$
  • Scale:
    • $p(\frac y \theta|\theta) = f(u)$, $u = \frac y \theta$
    • $\frac y \theta$ — pivotal quantity, $\theta$ - scale parameter
    • $p(\log \theta) \propto C$, $p(\theta) \propto \frac 1 \theta$

Weakly-informative priors

  • Some information
  • Less information than in the data
  • Examples:
    • Covered by water: $\mathrm{Uniform}(0.5, 1)$
    • Salary: $\mathrm{Exponential}(11\,500₪)$

Readings

  1. Bayesian Data Analysis — Chapter 2: Single-parameter models.
  2. Statistical rethinking — Sections 2.3: Components of the model, 2.4: Making the model go.

Hands-on

  • Circle or square?