Publication:

Optimal Transport in Statistical Inference and Computation

Loading...
Thumbnail Image

Date

2019-05-17

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Bernton, Espen. 2019. Optimal Transport in Statistical Inference and Computation. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Abstract

This manuscript discusses four different topics at the intersection of statistics and optimal transport. Chapters 2 and 3 focus on different features of using the Wasserstein distance as a tool for inference in parametric models. Chapters 4 and 5 apply techniques and algorithms from the optimal transport literature to develop new theoretical results and methods in the field of Monte Carlo simulation. Each chapter is self-contained, and their corresponding abstracts are given below. Chapter 2: Statistical inference can be performed by minimizing, over the parameter space, the Wasserstein distance between model distributions and the empirical distribution of the data. We study asymptotic properties of such minimum Wasserstein distance estimators, complementing results derived by Bassetti, Bodini and Regazzini in 2006. In particular, our results cover the misspecified setting, in which the data-generating process is not assumed to be part of the family of distributions described by the model. Our results are motivated by recent applications of minimum Wasserstein estimators to complex generative models. We discuss some difficulties arising in the numerical approximation of these estimators. Two of our numerical examples (g-and-k and sum of log-Normals) are taken from the literature on approximate Bayesian computation, and have likelihood functions that are not analytically tractable. Two other examples involve misspecified models. Chapter 3: A growing number of generative statistical models do not permit the numerical evaluation of their likelihood functions. Approximate Bayesian computation (ABC) has become a popular approach to overcome this issue, in which one simulates synthetic data sets given parameters and compares summaries of these data sets with the corresponding observed values. We propose to avoid the use of summaries and the ensuing loss of information by instead using the Wasserstein distance between the empirical distributions of the observed and synthetic data. This generalizes the well-known approach of using order statistics within ABC to arbitrary dimensions. We describe how recently developed approximations of the Wasserstein distance allow the method to scale to realistic data sizes, and propose a new distance based on the Hilbert space-filling curve. We provide a theoretical study of the proposed method, describing consistency as the threshold goes to zero while the observations are kept fixed, and concentration properties as the number of observations grows. Various extensions to time series data are discussed. The approach is illustrated on various examples, including univariate and multivariate g-and-k distributions, a toggle switch model from systems biology, a queueing model, and a L'evy-driven stochastic volatility model. Chapter 4: Algorithms based on discretizing Langevin diffusion have become popular tools for sampling from high-dimensional distributions. We develop novel connections between such Monte Carlo algorithms, the theory of Wasserstein gradient flow, and the operator splitting approach to solving PDEs. In particular, we show that a proximal version of the Unadjusted Langevin Algorithm corresponds to a scheme that alternates between solving the Wasserstein gradient flows of the entropy and potential energy functionals on the space of probability measures. Using this composite minimization perspective, we derive some new non-asymptotic results on the convergence properties of this algorithm. Chapter 5: Consider a reference Markov process with initial distribution $\pi_{0}$ and transition kernels ${M_{t}}{t\in[1:T]}$, for some $T\in\mathbb{N}$. Assume that you are given distribution $\pi{T}$, which is not equal to the marginal distribution of the reference process at time $T$. In this scenario, Schr"odinger addressed the problem of identifying the Markov process with initial distribution $\pi_{0}$ and terminal distribution equal to $\pi_{T}$ which is the closest to the reference process in terms of Kullback-Leibler divergence. This special case of the so-called Schr"odinger bridge problem can be solved efficiently using the iterative proportional fitting procedure (IPFP), also known as Sinkhorn's algorithm. We leverage these ideas to develop novel Monte Carlo schemes, termed Schr"odinger bridge samplers, to approximate a target distribution $\pi$ on $\mathbb{R}^{d}$ and to unbiasedly estimate its normalizing constant. This is achieved by iteratively modifying the transition kernels of a reference Markov chain to obtain a process whose marginal distribution at time $T$ becomes closer to $\pi_T = \pi$, via regression-based approximations of the corresponding IPFP recursions. We empirically demonstrate its performance in several applications, and make connections with other problems arising in the optimal transport, optimal control and physics literatures.

Description

Other Available Sources

Research Data

Keywords

Optimal transport, Wasserstein distance, parameter estimation, generative models, likelihood-free inference, Bayesian computation, Wasserstein gradient flow, Langevin Monte Carlo, Schrödinger bridges, sequential Monte Carlo

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories