Summary
The course teaches practical skills, and theoretical foundations behind these skills, for the analysis of data, the core subject of Data Science. Thanks to advances in machine learning, elaborated dependencies can be learned from data. Bayesian data analysis builds on machine learning and the Bayesian approach to probability to perform inference in complex probabilistic models. In the center of Bayesian data analysis lies the concept of a generative probabilistic model, which describes the process through which the data is, or could be, generated. Inference is then performed on the model given the data, allowing to make predictions both about future, yet unseen data, and about unobservable phenomena which affect the data. Uncertainty is naturally modeled within the framework of the Bayesian approach.
During the course we will learn to specify probabilistic generative models for a number of important classes of data science problems, such as generalised linear models, hierarchical models, mixture models, and others, and perform inference on these models using modern tools and inference algorithms. We will explore model checking, comparison and selection. The homework will help develop hands-on skills in Bayesian data modelling and analysis.
How we learn
We meet weekly for 3 hours on Zoom (Wed 16:00-19:00). First two hours are mostly theory. The last hour will be solving practical problems together, with a Jupyter notebook or Unix/X11 terminal.
We will have 4 homework assignments, each combining theoretical and programming exercises. Homework assignments should be done in pairs.
We use Slack for announcements, questions, and discussions.
Lectures
- Introduction, recording
- Fundamentals, recording
- Bulding blocks, recording
- Priors, Approximate computation, recording
- Hierarchical models, recording
- Hierarchical models, cont., recording
- Model checking, recording
- Model evaluation, recording
- Regression models, recording
- Mixture models, recording
- Differential equations, recording
- Roundup, recording
Homework
Homework should be submitted in pairs via Moodle. You may submit either a Jupyter (.ipynb
) or Pluto (.jl
) notebook. If your solution requires external files (data or images), put the files online and load via their URLs.
- Julia and Turing, notebook template, submission deadline Apr 6th, 2021, 23:59.
- Hierarchical models, notebook template, City of Norfolk salaries dataset, submission deadline May 18th, 2021, 23:59. Partial solution — City of Norfolk salaries analysis: expanded or folded.
- Checking comparing and evaluating models, notebook template, submission deadline June 15th, 2021, 23:59.
- Roundup, notebook template, Iris data, reedfrogs data, submission deadline August 1st, 2021, 23:59.
Links
Communication
Repositories
Bibliography
- Richard McElreath. Statistical Rethinking
- Gelman et al. Bayesian Data Analysis
- Goodman and Tenenbaum. Probabilistic Models of Cognition
- Bishop. Pattern Recognition and Machine Learning
- Turing and Julia
- Stan
- Infergo
- As well as anything you find (useful) about probabilistic programming.
Previous years
About
Taught and maintained by David Tolpin.