## Optimal Transport in Statistical Inference and Computation

##### Citation

Bernton, Espen. 2019. Optimal Transport in Statistical Inference and Computation. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.##### Abstract

This manuscript discusses four different topics at the intersection of statistics and optimal transport. Chapters 2 and 3 focus on different features of using the Wasserstein distance as a tool for inference in parametric models. Chapters 4 and 5 apply techniques and algorithms from the optimal transport literature to develop new theoretical results and methods in the field of Monte Carlo simulation. Each chapter is self-contained, and their corresponding abstracts are given below.Chapter 2: Statistical inference can be performed by minimizing, over the parameter space, the Wasserstein distance between model distributions and the empirical distribution of the data.

We study asymptotic properties of such minimum Wasserstein distance estimators, complementing results derived by Bassetti, Bodini and Regazzini in 2006.

In particular, our results cover the misspecified setting, in which the data-generating process is not assumed to be part of the family of distributions described by the model.

Our results are motivated by recent applications of minimum Wasserstein estimators to complex generative models.

We discuss some difficulties arising in the numerical approximation of these estimators.

Two of our numerical examples (g-and-k and sum of log-Normals) are taken from the literature on approximate Bayesian computation, and have likelihood functions that are not analytically tractable.

Two other examples involve misspecified models.

Chapter 3: A growing number of generative statistical models do not permit the numerical evaluation of their likelihood functions.

Approximate Bayesian computation (ABC) has become a popular approach to overcome this issue, in which one simulates synthetic data sets given parameters and compares summaries of these data sets with the corresponding observed values. We propose to avoid the use of summaries and the ensuing loss of information by instead using the Wasserstein distance between the empirical distributions of the observed and synthetic data.

This generalizes the well-known approach of using order statistics within ABC to arbitrary dimensions.

We describe how recently developed approximations of the Wasserstein distance allow the method to scale to realistic data sizes, and propose a new distance based on the Hilbert space-filling curve. We provide a theoretical study of the proposed method, describing consistency as the threshold goes to zero while the observations are kept fixed, and concentration properties as the number of observations grows.

Various extensions to time series data are discussed.

The approach is illustrated on various examples, including univariate and multivariate g-and-k distributions, a toggle switch model from systems biology, a queueing model, and a L\'evy-driven stochastic volatility model.

Chapter 4: Algorithms based on discretizing Langevin diffusion have become popular tools for sampling from high-dimensional distributions.

We develop novel connections between such Monte Carlo algorithms, the theory of Wasserstein gradient flow, and the operator splitting approach to solving PDEs. In particular, we show that a proximal version of the Unadjusted Langevin Algorithm corresponds to a scheme that alternates between solving the Wasserstein gradient flows of the entropy and potential energy functionals on the space of probability measures. Using this composite minimization perspective, we derive some new non-asymptotic results on the convergence properties of this algorithm.

Chapter 5: Consider a reference Markov process with initial distribution $\pi_{0}$ and transition kernels $\{M_{t}\}_{t\in[1:T]}$, for some $T\in\mathbb{N}$. Assume that you are given distribution $\pi_{T}$, which is not equal to the marginal distribution of the reference process at time $T$. In this scenario, Schr\"odinger addressed the problem of identifying the Markov process with initial distribution $\pi_{0}$ and terminal distribution equal to $\pi_{T}$ which is the closest to the reference process in terms of Kullback-Leibler divergence. This special case of the so-called Schr\"odinger bridge problem can be solved efficiently using the iterative proportional fitting procedure (IPFP), also known as Sinkhorn's algorithm.

We leverage these ideas to develop novel Monte Carlo schemes, termed Schr\"odinger bridge samplers, to approximate a target distribution $\pi$ on $\mathbb{R}^{d}$ and to unbiasedly estimate its normalizing constant. This is achieved by iteratively modifying the transition kernels of a reference Markov chain to obtain a process whose marginal distribution at time $T$ becomes closer to $\pi_T = \pi$, via regression-based approximations of the corresponding IPFP recursions. We empirically demonstrate its performance in several applications, and make connections with other problems arising in the optimal transport, optimal control and physics literatures.

##### Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA##### Citable link to this page

http://nrs.harvard.edu/urn-3:HUL.InstRepos:42029749

##### Collections

- FAS Theses and Dissertations [6389]

Contact administrator regarding this item (to report mistakes or request changes)