Applied Bayesian Data Analysis

Model evaluation

Concepts:

log-predictive density
cross validation
information criteria
model expansion

David Tolpin, david.tolpin@gmail.com

Measures of predictive accuracy

Point prediction
Probabilistic prediction

Point prediction

Mean squared error (L2): $\frac 1 n \sum_{i=1}^n (y_i - E(y_i|\theta))^2$
Mean absolute error (L1): $\frac 1 n \sum_{i=1}^n |y - E(y_i|\theta)|$
... (black magic)

Probabilistic prediction

Scoring rule must be proper and local
logarithmic score is the proper local scoring rule: $L = \log p(y|\theta)$

Single data point

Ideal measure: out-of-sample perofrmance
$\log p_{post}(\tilde y_i) = \log \mathbb{E}_{post}(p(\tilde y_i|\theta))$ $$ = \log \int p(\tilde y_i|\theta)p_{post}(\theta) d \theta$$

Averaging over future data

Expected Log Predictive Density: $$\mathrm{elpd} = E_f(\log p_{post}(\tilde y_i))$$
... Pointwise ...: $$\mathrm{elppd} =\sum_{i=1}^n E_f(\log p_{post}(\tilde y_i))$$
If we have point estimate $\hat \theta$, we can consider $E_f(\log p(\tilde y|\hat \theta))$
For independent data given parameters: $p(\tilde y|\hat \theta) = \prod_{i=1}^n p(\tilde y_i|\hat \theta)$

Log Pointwise Predictive Density

Marginalize over $\theta$: $$\mathrm{lppd} = \sum_{i=1}^n \log \int p(y_i|\theta) p_{post}(\theta)d\theta$$
Approximate using draws: $$\mathrm{lppd} = \sum_{i=1}^n \log \left(\frac 1 S \sum_{s=1}^S p(y_i|\theta^s)\right)$$

Cross-validation and information criteria

A meeting with Enrico Fermi

In desperation I asked Fermi whether he was not impressed by the agreement between our calculated numbers and his measured numbers. He replied, “How many arbitrary parameters did you use for your calculations?” I thought for a moment about our cut-off procedures and said, “Four.” He said, “I remember my friend Johnny von Neumann used to say, with four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” With that, the conversation was over. I thanked Fermi for his time and trouble, and sadly took the next bus back to Ithaca to tell the bad news to the students.

Estimating out-of-sample accuracy using sample data

Within-sample predictive accuracy
Cross-validation
Adjusted within-sample predictive accuracy

Leave-one-out cross validation

Use all but one sample: $\mathrm{lppd}_{LOO-CV} = \sum_{i=1}^n \log p_{post(-i)} (y_i)$
Approximation by draws: $\mathrm{lppd}_{LOO-CV} = \sum_{i=1}^n \log \left(\frac 1 S \sum_{s=1}^S p(u_i|\theta^{is})\right)$
Bias correction: $b = \mathrm{lppd} - \overline {\mathrm{lppd}}_{-i}$

Information criteria

$\mathrm{*IC} = -2 \cdot (\mathrm{lppd} - \mathrm{correction})$
$-2 \cdot$ — for historical reasons (compare log density of normal distribution: $- \log \sqrt{2\pi} - \frac {x^2} 2$

Akaike IC (AIC)

$$\mathrm{elpd}_{AIC} = \log p(y|\hat \theta_{mle}) - k$$

where $k$ is the number of parameters

Deviance IC and effective number of parameters

'Bayesian' AIC: $\theta_{Bayes} = E(\theta|y)$
$\mathrm{elpd}_{DIC} = \log p(y|\hat \theta_{Bayes}) - p_{DIC}$
$p_{DIC}$ — effective number of parameters
$p_{DIC} = 2\left(\log p(y|\hat \theta_{Bayes}) - E_{post}(\log p(y|\theta))\right)$
Approximation by draws: $p_{DIC} = 2\left(\log p(y|\hat \theta_{Bayes}) - \frac 1 S \sum_{s=1}^S \log p(y|\theta^s)\right)$
Alternative form: $P_{DIC_{alt}} = 2 \mathrm{var}_{post} (log p(y|\theta))$

Watanabe-Akaike IC (WAIC)

Even more 'Bayesian:'

$$\mathrm{elppd}_{WAIC} = \mathrm{lppd} - p_{WAIC}$$ where $\mathrm{lppd} = \sum_{i=1}^n \log \int p(y_i|\theta) p_{post}(\theta)d\theta$
Approximated by draws: $\mathrm{lppd} = \sum_{i=1}^n \log \left(\frac 1 S \sum_{s=1}^S p(y_i|\theta^s)\right)$
Approximation to cross-validation: $p_{WAIC} = 2 \sum_{i=1}^n\left(\log E_{post} (p(y_i|\theta)) - E_{post}(\log p(y_i|\theta))\right)$
Approximation by draws: $p_{WAIC} = 2 \sum_{i=1}^n \left( \log (\frac 1 S \sum_{s=1}^S p(y_i|\theta^s)) - \frac 1 S \sum_{s=1}^S \log p(y_i|\theta^s)\right)$
Alternative form: $p_{WAIC_2} = \sum_{i=1}^N \mathrm{var}_{post}(\log p(y_i|\theta))$

Readings

Bayesian Data Analysis — chapter 7: evaluating, expanding and comparing models.
Statistical rethinking — chapter 7: Ulysses' compass.