Applied Bayesian Data Analysis

Introduction to regression models

Concepts:

  • Linear regression
  • Effect, treatment, control

David Tolpin, david.tolpin@gmail.com

Conditional modeling

  • $y$ — outcome variable
  • $x=(x_1, ..., x_k)$ — explanatory variables, predictors
  • $X$ — matrix of predictors $n \times k$

Linear model

  • $\mathrm{E}(y_i|\beta, X) = \beta_1 x_{i1} + ... + \beta_k x_{ik}$
  • Often $x_{i1}\equiv 1$
  • Ordinary linear regression: $\mathrm{var}(y_i|\theta, X) = \sigma^2 \forall i$

Modeling with linear model

  1. Define $x$ and $y$ so that $y$ is $\approx$ linear on $x$
  2. Set up prior on parameters $\theta=(\beta_1, ..., \beta_k, \sigma)$
  3. Objective: $p(\theta|X, y)$

Bayesian linear regression

  • Model: $y|\beta, \sigma, X \sim \mathcal{N}(X\beta, \sigma^2I)$
  • Prior: $p(\beta, \sigma^2|X) \propto \sigma^{-2}$

BLR: Posterior

  • Conditional $$\beta|\sigma, y \propto \mathcal{N}(\hat \beta, V_\beta \sigma^2)$$ where
    • $\hat \beta = (X^\top X)^{-1}X^{\top}y$
    • $V_\beta = (X^{\top}X)^{-1}$
  • Marginal $$\sigma^2|y \propto \mbox{Inv-}\chi^2(n-k, s^2)$$ where $s^2 = \frac 1 {n-k}(y-X\hat \beta)^{\top}(y-X\hat \beta)$

BLR: Posterior predictive

Analytic form (for insights):

  • $\mathrm{E}(\tilde y|\sigma, y) = \tilde X\beta$
  • $\mathrm{var}(\tilde y|\sigma, y) = (I + \tilde XV_\beta\tilde X^{\top})\sigma^2$

BLR: Model checking

  • Residuals plot
  • Correlation between residuals and fitted values
  • Any other statistics of interest

Linear regression in Turing


@model function linear_regression(x, y)
    # Set variance prior.
    σ₂ ~ truncated(Normal(0, 100), 0, Inf)
    
    # Set intercept prior.
    intercept ~ Normal(0, sqrt(3))
    
    # Set the priors on our coefficients.
    nfeatures = size(x, 2)
    coefficients ~ MvNormal(nfeatures, sqrt(10))
    
    # Calculate all the mu terms.
    mu = intercept .+ x * coefficients
    y ~ MvNormal(mu, sqrt(σ₂))
end
						

Example: Howell dataset

"height","weight","age","male"
151.765,47.8256065,63,1
139.7,36.4858065,63,0
136.525,31.864838,65,0
156.845,53.0419145,41,1
145.415,41.276872,51,0
163.83,62.992589,35,1
149.225,38.2434755,32,0
168.91,55.4799715,27,1
147.955,34.869885,19,0
165.1,54.487739,54,1
154.305,49.89512,47,0
151.13,41.220173,66,1
144.78,36.0322145,73,0
... 
Slides by Richard McElreath

Unequal variances

  • Earlier we set: $y \sim \mathcal{N}(X\beta, \sigma_2I)$
  • In general: $y \sim \mathcal{N}(X\beta, \Sigma)$
  • Weighted LR: $\Sigma_{ii} = \sigma^2/w_i$, $\Sigma_{ij} = 0$
  • Example:
    • $x$ — company's worth
    • $y$ — average salary
    • $w$ - number of employees

Goals of regression

Two (at least) different goals

  1. Predicting $\tilde y$ for new $\tilde X$
  2. Estimating treatment effects $\beta$

Choosing explanatory variables

  • No collinearity ($x_1 \not\approx ax_2$)
  • Non-linear relations ($\log(x_1), x_1, x_1^2$)
  • Indicator variables ($x_1 \in \{0, 1\}$)
  • Ordered categorical variables <-> continuous variables

Readings

  1. Bayesian Data Analysis — chapter 14: Introduction to regression models.
  2. Statistical rethinking — chapter 4: Geocentric models..

Hands-on

  • Turing tutorial on linear regression.