Applied Bayesian Data Analysis

Introduction to regression models

Concepts:

Linear regression
Effect, treatment, control

David Tolpin, david.tolpin@gmail.com

Conditional modeling

$y$ — outcome variable
$x=(x_1, ..., x_k)$ — explanatory variables, predictors
$X$ — matrix of predictors $n \times k$

Linear model

$\mathrm{E}(y_i|\beta, X) = \beta_1 x_{i1} + ... + \beta_k x_{ik}$
Often $x_{i1}\equiv 1$
Ordinary linear regression: $\mathrm{var}(y_i|\theta, X) = \sigma^2 \forall i$

Modeling with linear model

Define $x$ and $y$ so that $y$ is $\approx$ linear on $x$
Set up prior on parameters $\theta=(\beta_1, ..., \beta_k, \sigma)$
Objective: $p(\theta|X, y)$

Bayesian linear regression

Model: $y|\beta, \sigma, X \sim \mathcal{N}(X\beta, \sigma^2I)$
Prior: $p(\beta, \sigma^2|X) \propto \sigma^{-2}$

BLR: Posterior

Conditional $$\beta|\sigma, y \propto \mathcal{N}(\hat \beta, V_\beta \sigma^2)$$ where
- $\hat \beta = (X^\top X)^{-1}X^{\top}y$
- $V_\beta = (X^{\top}X)^{-1}$
Marginal $$\sigma^2|y \propto \mbox{Inv-}\chi^2(n-k, s^2)$$ where $s^2 = \frac 1 {n-k}(y-X\hat \beta)^{\top}(y-X\hat \beta)$

BLR: Posterior predictive

Analytic form (for insights):

$\mathrm{E}(\tilde y|\sigma, y) = \tilde X\beta$
$\mathrm{var}(\tilde y|\sigma, y) = (I + \tilde XV_\beta\tilde X^{\top})\sigma^2$

BLR: Model checking

Residuals plot
Correlation between residuals and fitted values
Any other statistics of interest

Linear regression in Turing


@model function linear_regression(x, y)
    # Set variance prior.
    σ₂ ~ truncated(Normal(0, 100), 0, Inf)
    
    # Set intercept prior.
    intercept ~ Normal(0, sqrt(3))
    
    # Set the priors on our coefficients.
    nfeatures = size(x, 2)
    coefficients ~ MvNormal(nfeatures, sqrt(10))
    
    # Calculate all the mu terms.
    mu = intercept .+ x * coefficients
    y ~ MvNormal(mu, sqrt(σ₂))
end

Example: Howell dataset

"height","weight","age","male"
151.765,47.8256065,63,1
139.7,36.4858065,63,0
136.525,31.864838,65,0
156.845,53.0419145,41,1
145.415,41.276872,51,0
163.83,62.992589,35,1
149.225,38.2434755,32,0
168.91,55.4799715,27,1
147.955,34.869885,19,0
165.1,54.487739,54,1
154.305,49.89512,47,0
151.13,41.220173,66,1
144.78,36.0322145,73,0
...

Slides by Richard McElreath

Unequal variances

Earlier we set: $y \sim \mathcal{N}(X\beta, \sigma_2I)$
In general: $y \sim \mathcal{N}(X\beta, \Sigma)$
Weighted LR: $\Sigma_{ii} = \sigma^2/w_i$, $\Sigma_{ij} = 0$
Example:
- $x$ — company's worth
- $y$ — average salary
- $w$ - number of employees

Goals of regression

Two (at least) different goals

Predicting $\tilde y$ for new $\tilde X$
Estimating treatment effects $\beta$

Choosing explanatory variables

No collinearity ($x_1 \not\approx ax_2$)
Non-linear relations ($\log(x_1), x_1, x_1^2$)
Indicator variables ($x_1 \in \{0, 1\}$)
Ordered categorical variables <-> continuous variables

Readings

Bayesian Data Analysis — chapter 14: Introduction to regression models.
Statistical rethinking — chapter 4: Geocentric models..

Hands-on

Turing tutorial on linear regression.