Computational Statistics (Stat GR6104)

Spring 2025

This is a Ph.D.-level seminar course in computational statistics. A link to the most recent previous iteration of this course is here.

Note: instructor permission is required to take this class for students outside of the Statistics Ph.D. program.

Time: Tu 2:10
Place: JLG L7.119
Professor: Liam Paninski; Email: liam at stat dot columbia dot edu. Hours by appointment.

Course goals: (partially adapted from the preface of Givens' and Hoeting's book): Computation plays a central role in modern statistics and machine learning. This course aims to cover topics needed to develop a broad working knowledge of modern computational statistics. We seek to develop a practical understanding of how and why existing methods work, enabling effective use of modern statistical methods. Achieving these goals requires familiarity with diverse topics in statistical computing, computational statistics, computer science, and numerical analysis. Our choice of topics reflects our view of what is central to this evolving field, and what will be interesting and useful. A key theme is scalability to problems of high dimensionality, which are of most interest to many recent applications.
Some important topics will be omitted because high-quality solutions are already available in most software. For example, the generation of pseudo-random numbers is a classic topic, but existing methods built in to standard software packages will suffice for our needs. On the other hand, we will spend a bit of time on some classical numerical linear algebra ideas, because choosing the right method for solving a linear equation (for example) can have a huge impact on the time it takes to solve a problem in practice, particularly if there is some special structure that we can exploit.

Audience: The course will be aimed at first- and second-year students in the Statistics Ph.D. program. Students from other departments or programs are welcome, space permitting; instructor permission required.

Background: The level of mathematics expected does not extend much beyond standard calculus and linear algebra. Breadth of mathematical training is more helpful than depth; we prefer to focus on the big picture of how algorithms work and to sweep under the rug some of the nitty-gritty numerical details. The expected level of statistics is equivalent to that obtained by a graduate student in his or her first year of study of the theory of statistics and probability. An understanding of maximum likelihood methods, Bayesian methods, elementary asymptotic theory, Markov chains, and linear models is most important.

Programming: With respect to computer programming, good students can learn as they go. We'll forgo much language-specific examples, algorithms, or coding; I won't be teaching much programming per se, but rather will focus on the overarching ideas and techniques. For the projects, I recommend you choose a high-level, interactive package that permits the flexible design of graphical displays and includes supporting statistics and probability functions, e.g., R or Python.

Evaluation: Final grades will be based on class participation and a student project.

Topics:
Deterministic optimization
- Newton-Raphson, conjugate gradients, preconditioning, quasi-Newton methods, Fisher scoring, EM and its various derivatives
- Numerical recipes for linear algebra: matrix inverse, LU, Cholesky decompositions, low-rank updates, SVD, banded matrices, Toeplitz matrices and the FFT, Kronecker products (separable matrices), sparse matrix solvers
- Convex analysis: convex functions, duality, KKT conditions, interior point methods, projected gradients, augmented Lagrangian methods, convex relaxations
- Applications: support vector machines, splines, Gaussian processes, isotonic regression, LASSO and LARS regression

Graphical models: dynamic programming, hidden Markov models, forward-backward algorithm, Kalman filter, Markov random fields

Stochastic optimization: Robbins-Monro and Kiefer-Wolfowitz algorithms, simulated annealing, stochastic gradient methods

Deterministic integration: Gaussian quadrature, quasi-Monte Carlo. Application: expectation propagation

Monte Carlo methods
- Rejection sampling, importance sampling, variance reduction methods (Rao-Blackwellization, stratified sampling)
- MCMC methods: Gibbs sampling, Metropolis-Hastings, Langevin methods, Hamiltonian Monte Carlo, slice sampling. Implementation issues: burnin, monitoring convergence
- Sequential Monte Carlo (particle filtering)
- Variational and stochastic variational inference

References:
Givens and Hoeting (2005) Computational statistics
Robert and Casella (2004) Monte Carlo Statistical Methods
Boyd and Vandenberghe (2004), Convex Optimization.
Press et al, Numerical Recipes
Sun and Yuan (2006), Optimization theory and methods
Fletcher (2000) Practical methods of optimization
Searle (2006) Matrix Algebra Useful for Statistics
Spall (2003), Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control
Shewchuk (1994), An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
Boyd et al (2011), Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

Schedule

Date	Topic	Reading	Notes
Jan 21	Introduction
Jan 28	Gaussian processes and the Kalman model	Loper et al '20 for fast one-d GP inference	See Rasmussen and Williams (2006) for more background on GP regression. Also notes by John Cunningham, Gardner et al '19, and some nice demos by Goertler et al '19 and Agnihotra and Batri '20.
Feb 4	State space models	Paninski et al '10, Sarkka + Garcia-Fernandez '19	Grootendorst '24 tutorial
Feb 4	Bayesian optimization	Frazier et al '18, Eriksson et al '19
Feb 11	Preconditioning and conjugate gradients	Shewchuk `94, Gardner et al '19	Chan and Ng (1996) on PCG for Toeplitz systems
Feb 11	Approximate Newton methods	Lacotte et al '21
Feb 11	Scaling quasi-Newton methods	Goldfarb et al '21, Martens + Grosse `20
Feb 18	No class
Feb 25	Expectation maximization and variational inference	Dempster et al (1977), Neal and Hinton (1999), Blei et al (2016)	Generalizations: Knoblauch et al (2022)
Feb 25, Mar 4	Prox methods and acceleration	Bubeck (2014), Bach et al (2011), Schmidt et al (2011)	Nesterov (1998), Devolder et al (2014), nice lecture notes from Y. Chen (2019)
Feb 25	LASSO methods	Efron et al (2004), Zou et al (2007), Friedman et al (2010), Bradley et al (2011), Tibshirani et al (2012)	More reading: Bach et al (2011), Kim et al (2009), Tseng (2001), Osborne et al (2000)
Feb 25	Stochastic gradient descent	Bottou et al (2018)	More reading: Wilson et al (2018), Zhang et al (2015)
Mar 4	Optimal transport	Peyre and Cuturi (2020), Nutz (2022), Arjovsky et al (2017)
Mar 11	2-minute project idea presentations
Mar 18	Spring break
Mar 25	Monte Carlo tree search	Browne et al '12, Kocsis and Szepesvari '06, Silver et al '18, Zhou et al `24
Mar 25	Normalizing flows	Papamakarios et al `21, van den Berg et al `18
Apr 1	Robust multivariate methods	Pensia et al '24
Apr 1	Implicit regularization of gradient descent in non-convex problems	Ma et al '19, Chen et al `19
Apr 8	Sequential Monte Carlo	Doucet and Johansen (2011), Pitt and Shephard (1999), Naesseth et al (2017)	Further reading collected by A. Doucet; Kantas et al (2014), Gabrie et al (2022), Wu et al (2023)
Apr 8	Policy gradients	Schulman et al `18
Apr 15	Amortized inference	Cremer et al '18, Marino et al '18
Apr 22	No class
Apr 29	Spatiotemporal models	Prates et al `22
Apr 29	Control variates and variance reduction for stochastic differential equations	Hinds and Tretyakov `22
May 6, 13	Project presentations		Send me your report as a pdf by May 15.