Applied Bayesian Data Analysis
							Priors
					
David Tolpin, david.tolpin@gmail.com
					Concepts
					
						- Conjugacy
- Informative, non-informative, semi-informative priors
- Pivotal quantities
Conjugacy
						A boring mathematical concept:
						
					
					
						Example: binomial model
						\begin{aligned}
							\theta &\sim \mathrm{Prior} \\
							y_{1:n}&\sim \mathrm{Bernoulli}(\theta)
						 \end{aligned}
						 How to choose $\mathrm{Prior}$?
							
						
							- $p(\theta|y) \propto p(y, \theta) = p(\theta)p(y|\theta)$
- $p(y_{1:n}|\theta) = \theta^k(1 - \theta)^{n-k}$
- if $p(\theta) \propto \theta^a(1-\theta)^b$
 then $p(\theta|y) \propto \theta^{a + k}(1 - \theta)^{b + n - k}$ — same form
Example: binomial model
						
							- $\mathrm{Beta}(\theta|\alpha, \beta) = \frac 1 {\mathrm{B}(\alpha, \beta)} \theta^{\alpha-1} (1 - \theta)^{\beta - 1}$
- $\mathrm{Beta}(\alpha, \beta)$ is the conjugate prior for $\mathrm{Bernoulli}(\theta)$
- $\alpha$ — number of ‘prior’ successes (heads),
- $\beta$ — number of ‘prior’ failures (tails).
 
- $\alpha=\beta=1$ — uniform $[0, 1]$ prior.
Exponential families
						
							- $\mathcal{F}$ is an exponential family if
								$$p(y_i|\theta) = f(y_i)g(\theta)e^{\phi(\theta)^\top u(y_i)}$$
							
- $\phi(\theta)$ — natural parameter
								
- likelihood of set $y=(y_1, ..., y_n)$ is
									$$p(y|\theta) \propto g(\theta)^ne^{\phi(\theta)^Tt(y)}$$
									where $t(y) = \sum_{i=1}^n u(y_i)$
								
- $t(y)$ is a sufficient statistics for $\theta$, all we need to know about the data
Exp. family conjugates
						
							- If $p(\theta) \propto g(\theta)^\eta e^{\phi(\theta)^T\nu}$,
- then $p(\theta|y) \propto g(\theta)^{\eta+n}e^{\phi(\theta)^T(\nu+t(y))}$.
- $p(\theta|y)$ has the same form, so $p(\theta)$ is conjugate to $p(y|\theta)$.
Exp. family members
						
							- Bernoulli
- Normal, $\propto \frac 1 \sigma e^{-\frac 1 {2\sigma^2} {(x-\mu)^2}}$
- Poisson, $\propto \theta^y e^{-\theta}$
- Exponential, $\propto \theta e^{-y\theta}$
- ...
Specifying priors
						
							- Prior $p(\theta) = \int_Y p(\theta|y)p(y)dy$ is marginal of $\theta$ over all possible observations.
- Posterior is a compromise between prior and conditional:
								
									- $\mathbb{E}(\theta) = \mathbb{E}(\mathbb{E}(\theta|y))$
- $\mathbb{E}(\mathbb{var}(\theta)) = \mathbb{E}(\mathbb{var}(\theta|y)) + \mathbb{var}(\mathbb{E}(\theta|y))$
										
											- $\mathbb{E}(\mathbb{var}(\theta|y))$ — ‘unexplained’ variation
- $\mathbb{var}(\mathbb{E}(\theta|y))$ — ‘explained’ variation
 
 
							
- Posterior variance is on average smaller than prior
- If posterior variance is greater, look for a problem
Informative priors
						
							- Prior defines the ‘population’
- Or, prior defines the ‘state of knowledge’
- Example: coin flip
								
									- 9+1 coins from the same batch
- 5 fell on heads, 4 on tails on a single toss
- Prior for the 10th coin: $\mathrm{Beta}(5, 4)$
 
Non-informative priors
						
							- No prior information, (almost) all distributions are possible
- May be also used for ‘regularization’ that is, making model work
- Examples:
								
									- $\mathrm{Beta}(1, 1)$ — uniform prior
- $\mathrm{Normal}(0, 1000)$ — regularization
 
Non-informative priors
						Pivotal quantity, location, scale
						
							- Location:
								
									- $p(y-\theta|\theta) = f(u)$, $u = y - \theta$
- $y - \theta$ — pivotal quantity, $\theta$ — location parameter
- $p(\theta) \propto C$
 
- Scale:
								
									- $p(\frac y \theta|\theta) = f(u)$, $u  = \frac y \theta$
- $\frac y \theta$ — pivotal quantity, $\theta$ - scale parameter
- $p(\log \theta) \propto C$, $p(\theta) \propto \frac 1 \theta$
 
Weakly-informative priors
						
							- Some information
- Less information than in the data
- Examples:
								
									- Covered by water: $\mathrm{Uniform}(0.5, 1)$
- Salary: $\mathrm{Exponential}(11\,500₪)$