Cours chaire FSMP:

An optimization perspective on sampling


An optimization perspective on sampling

Instructor: Philippe Rigollet

Room: 15-16-101 (Jussieu)

Course description

Sampling is a fundamental question in statistics and machine learning, most notably in Bayesian methods. Sampling and optimization present many similarities, some obvious, others more mysterious. In particular, the seminar work of Jordan, Kinderlehrer and Otto (’98) has unveiled a beautiful connection between the Brownian motion and the heat equation on the one hand, and optimal transport on the other. They showed that certain stochastic processes may be viewed as gradient descent over the Wasserstein space of probability distributions. This connection opens the perspective of a novel approach to sampling that leverages the rich toolbox of optimization to derive and analyze sampling algorithms. The goal of this course is to bring together the many ingredients that make this perspective possible starting from the basics and building to some of the most recent advances in sampling.



Past lectures are accurate, future lectures are tentative (time permitting).
Lecture Topic(s) Description
1Motivation The course is largely motiated by a common goal: compute summary statistics from posterior distributions. We primarily use the Langevin diffusions as a viable strategy to sample from a posterior, and ultimately compute such summary statistics. Indeed, it does not require knowledge of the normalizing constant of the target distribution.
Optimization We covered a quick introduction to optimization of strongly convex functions using gradient flows. Then we relaxed strong convexity to a Polyak-Łojasiewicz condition. We also discussed that in continuous time, which is the focus of this course, smoothness does not appear like it does for gradient descent.
Markov semigroups (1) Markov semigroups are a canonical tool to study the convergence of Markov processes. We introduced the main notions: semigroup, infinetisimal generator, Kolmogorov's backward equation and Kolmogorov's forward equation (called Fokker-Planck in this course), and studied the stationary distribution from Fokker Planck. All notions were computed for Langevin as a concrete example.
2Markov semigroups (2) Focusing on reversible Markov processes, we defined the Dirichlet form and the Dirichlet energy and showed that reversible Markov presses can be seen as a gradient flows of the Dirichlet energy. Then we discussed Poincare and log-Sobolev inequalities and how they lead to exponential convergence to stationarity in chi-squared and KL respectively. We showed that log-Sobolev implies Poincare and briefly discussed connections to concentration and pointing to van Handel's notes for more details.
3Optimal transport.Definition of the Wasserstein(-2) distance and some structural properties (duality, uniqueness, Brenier's theorem) gathered under the name of Fundamental theorem of optimal transport.
4Dynamics on Wasserstein spaceWasserstein curves and geodesics; continuity equation; The Wasserstein space as a Riemannian manifold
5Wasserstein gradient flowsWe show that the Langevin diffusion can be interpreted as Wasserstein flow of the KL divergence using Otto calculus. We show that Log-Sobolev inequality corresponds to a PL condition in this geometry and we recover exponential convergence of the KL. We also use this new framework to show that such ineuqalities hold whenever the potential is strongly convex, which, in turn, implies strong convexity of the KL in this geometry.
6Interacting particlesThis new perspective on sampling opens the possibility of algorithms that consist in deterministcally implementing the Wasserstein gradient flow. We review two of them: SVGD and Variational inference.

Target audience:

Graduate students with a solid grasp of probability and researchers interested in these topics.

Scribe notes:

Theo Dumont was kind enough to scribe (part of) these lectures. These notes are not proofread and I did not participate in their writing: this is all Theo's work. Please direct any questions or comments to him directly (email in the following overleaf

Additional reading:


Markov semigroups:

Basic of optimal transport:


[May 31] To receive announcements from the class (if any), please enter your contact information here.