Course description
Sampling is a fundamental question in statistics and machine learning, most notably in Bayesian methods. Sampling and optimization present many similarities, some obvious, others more mysterious. In particular, the seminar work of Jordan, Kinderlehrer and Otto (’98) has unveiled a beautiful connection between the Brownian motion and the heat equation on the one hand, and optimal transport on the other. They showed that certain stochastic processes may be viewed as gradient descent over the Wasserstein space of probability distributions. This connection opens the perspective of a novel approach to sampling that leverages the rich toolbox of optimization to derive and analyze sampling algorithms. The goal of this course is to bring together the many ingredients that make this perspective possible starting from the basics and building to some of the most recent advances in sampling.Lectures
Lecture | Topic(s) | Description |
---|---|---|
1 | Motivation | The course is largely motiated by a common goal: compute summary statistics from posterior distributions. We primarily use the Langevin diffusions as a viable strategy to sample from a posterior, and ultimately compute such summary statistics. Indeed, it does not require knowledge of the normalizing constant of the target distribution. |
Optimization | We covered a quick introduction to optimization of strongly convex functions using gradient flows. Then we relaxed strong convexity to a Polyak-Łojasiewicz condition. We also discussed that in continuous time, which is the focus of this course, smoothness does not appear like it does for gradient descent. | |
Markov semigroups (1) | Markov semigroups are a canonical tool to study the convergence of Markov processes. We introduced the main notions: semigroup, infinetisimal generator, Kolmogorov's backward equation and Kolmogorov's forward equation (called Fokker-Planck in this course), and studied the stationary distribution from Fokker Planck. All notions were computed for Langevin as a concrete example. | |
2 | Markov semigroups (2) | Focusing on reversible Markov processes, we defined the Dirichlet form and the Dirichlet energy and showed that reversible Markov presses can be seen as a gradient flows of the Dirichlet energy. Then we discussed Poincare and log-Sobolev inequalities and how they lead to exponential convergence to stationarity in chi-squared and KL respectively. We showed that log-Sobolev implies Poincare and briefly discussed connections to concentration and pointing to van Handel's notes for more details. |
3 | Optimal transport. | Definition of the Wasserstein(-2) distance and some structural properties (duality, uniqueness, Brenier's theorem) gathered under the name of Fundamental theorem of optimal transport. |
4 | Dynamics on Wasserstein space | Wasserstein curves and geodesics; continuity equation; The Wasserstein space as a Riemannian manifold |
5 | Wasserstein gradient flows | We show that the Langevin diffusion can be interpreted as Wasserstein flow of the KL divergence using Otto calculus. We show that Log-Sobolev inequality corresponds to a PL condition in this geometry and we recover exponential convergence of the KL. We also use this new framework to show that such ineuqalities hold whenever the potential is strongly convex, which, in turn, implies strong convexity of the KL in this geometry. |
6 | Interacting particles | This new perspective on sampling opens the possibility of algorithms that consist in deterministcally implementing the Wasserstein gradient flow. We review two of them: SVGD and Variational inference. |
Target audience:
Graduate students with a solid grasp of probability and researchers interested in these topics.Scribe notes:
Theo Dumont was kind enough to scribe (part of) these lectures. These notes are not proofread and I did not participate in their writing: this is all Theo's work. Please direct any questions or comments to him directly (email in the following overleafCLICK HERE FOR SCRIBE NOTES
Additional reading:
Optimization:
- Sebastien Bubeck, Convex Optimization: Algorithms and Complexity. arXiv:1405.4980
- Hamed Karimi, Julie Nutini, and Mark Schmidt, Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition. arXiv:1608.04636
- Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge University Press, 2004
Markov semigroups:
- Ramon van Handel, Probability in high dimensions. Lecture Notes, 2016
- Dominique Bakry, Ivan Gentil, and Michel Ledoux. Analysis and geometry of Markov diffusion operators. Springer, 2014.
Basic of optimal transport:
- Cedric Villani.Topics in optimal transportation. American Mathematical Society, 2003
- Cedric Villani. Optimal transport, old and new. Springer-Verlag, 2009.
- Filippo Santambrogio. Optimal transport for applied mathematicians. Birkhauser-Springer, 2015.
- Luigi Ambrosio Nicola Gigli. A user’s guide to optimal transport. Lecture Notes in Math., pages 1–155. Springer, 2013.
- Nicola Gigli. Second order analysis on $(\scr P_2(M),W_2)$. Mem. Amer. Math. Soc., 2012.
ANNOUCEMENTS
[May 31] To receive announcements from the class (if any), please enter your contact information here.