Applied Bayesian Data Analysis

Model evaluation

Concepts:

  • log-predictive density
  • cross validation
  • information criteria
  • model expansion

David Tolpin, david.tolpin@gmail.com

Measures of predictive accuracy

  • Point prediction
  • Probabilistic prediction

Point prediction

  • Mean squared error (L2): $\frac 1 n \sum_{i=1}^n (y_i - E(y_i|\theta))^2$
  • Mean absolute error (L1): $\frac 1 n \sum_{i=1}^n |y - E(y_i|\theta)|$
  • ... (black magic)

Probabilistic prediction

  • Scoring rule must be proper and local
  • logarithmic score is the proper local scoring rule: $L = \log p(y|\theta)$

Single data point

  • Ideal measure: out-of-sample perofrmance
  • $\log p_{post}(\tilde y_i) = \log \mathbb{E}_{post}(p(\tilde y_i|\theta))$ $$ = \log \int p(\tilde y_i|\theta)p_{post}(\theta) d \theta$$

Averaging over future data

  • Expected Log Predictive Density: $$\mathrm{elpd} = E_f(\log p_{post}(\tilde y_i))$$
  • ... Pointwise ...: $$\mathrm{elppd} =\sum_{i=1}^n E_f(\log p_{post}(\tilde y_i))$$
  • (related to cross-validation)
  • If we have point estimate $\hat \theta$, we can consider $E_f(\log p(\tilde y|\hat \theta))$
  • For independent data given parameters: $p(\tilde y|\hat \theta) = \prod_{i=1}^n p(\tilde y_i|\hat \theta)$

Log Pointwise Predictive Density

  • Marginalize over $\theta$: $$\mathrm{lppd} = \sum_{i=1}^n \log \int p(y_i|\theta) p_{post}(\theta)d\theta$$
  • Approximate using draws: $$\mathrm{lppd} = \sum_{i=1}^n \log \left(\frac 1 S \sum_{s=1}^S p(y_i|\theta^s)\right)$$

Cross-validation and information criteria

A meeting with Enrico Fermi

In desperation I asked Fermi whether he was not impressed by the agreement between our calculated numbers and his measured numbers. He replied, “How many arbitrary parameters did you use for your calculations?” I thought for a moment about our cut-off procedures and said, “Four.” He said, “I remember my friend Johnny von Neumann used to say, with four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” With that, the conversation was over. I thanked Fermi for his time and trouble, and sadly took the next bus back to Ithaca to tell the bad news to the students.

Estimating out-of-sample accuracy using sample data

  • Within-sample predictive accuracy
  • Cross-validation
  • Adjusted within-sample predictive accuracy

Leave-one-out cross validation

  • Use all but one sample: $\mathrm{lppd}_{LOO-CV} = \sum_{i=1}^n \log p_{post(-i)} (y_i)$
  • Approximation by draws: $\mathrm{lppd}_{LOO-CV} = \sum_{i=1}^n \log \left(\frac 1 S \sum_{s=1}^S p(u_i|\theta^{is})\right)$
  • Bias correction: $b = \mathrm{lppd} - \overline {\mathrm{lppd}}_{-i}$

Information criteria

  • $\mathrm{*IC} = -2 \cdot (\mathrm{lppd} - \mathrm{correction})$
  • $-2 \cdot$ — for historical reasons (compare log density of normal distribution: $- \log \sqrt{2\pi} - \frac {x^2} 2$

Akaike IC (AIC)

$$\mathrm{elpd}_{AIC} = \log p(y|\hat \theta_{mle}) - k$$

where $k$ is the number of parameters

Deviance IC and effective number of parameters

  • 'Bayesian' AIC: $\theta_{Bayes} = E(\theta|y)$
  • $\mathrm{elpd}_{DIC} = \log p(y|\hat \theta_{Bayes}) - p_{DIC}$
  • $p_{DIC}$ — effective number of parameters
  • $p_{DIC} = 2\left(\log p(y|\hat \theta_{Bayes}) - E_{post}(\log p(y|\theta))\right)$
  • Approximation by draws: $p_{DIC} = 2\left(\log p(y|\hat \theta_{Bayes}) - \frac 1 S \sum_{s=1}^S \log p(y|\theta^s)\right)$
  • Alternative form: $P_{DIC_{alt}} = 2 \mathrm{var}_{post} (log p(y|\theta))$

Watanabe-Akaike IC (WAIC)

Even more 'Bayesian:'

  • $$\mathrm{elppd}_{WAIC} = \mathrm{lppd} - p_{WAIC}$$ where $\mathrm{lppd} = \sum_{i=1}^n \log \int p(y_i|\theta) p_{post}(\theta)d\theta$
    Approximated by draws: $\mathrm{lppd} = \sum_{i=1}^n \log \left(\frac 1 S \sum_{s=1}^S p(y_i|\theta^s)\right)$
  • Approximation to cross-validation: $p_{WAIC} = 2 \sum_{i=1}^n\left(\log E_{post} (p(y_i|\theta)) - E_{post}(\log p(y_i|\theta))\right)$
  • Approximation by draws: $p_{WAIC} = 2 \sum_{i=1}^n \left( \log (\frac 1 S \sum_{s=1}^S p(y_i|\theta^s)) - \frac 1 S \sum_{s=1}^S \log p(y_i|\theta^s)\right)$
  • Alternative form: $p_{WAIC_2} = \sum_{i=1}^N \mathrm{var}_{post}(\log p(y_i|\theta))$

Readings

  1. Bayesian Data Analysis — chapter 7: evaluating, expanding and comparing models.
  2. Statistical rethinking — chapter 7: Ulysses' compass.

Hands-on