Publication:
Semi-definite Programming for Statistical Estimation: Power and Limitations

No Thumbnail Available

Date

2023-05-08

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Venkat, Prayaag. 2023. Semi-definite Programming for Statistical Estimation: Power and Limitations. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

The goal of this thesis to contribute towards a computational complexity theory of statistical inference problems. In recent years, researchers have built evidence in favor of an emerging hypothesis that the class of semi-definite programming (SDP) algorithms is optimal among for computationally efficient algorithms for a certain family of estimation problems. In this thesis, we present four main research efforts that refine this hypothesis and initiate preliminary efforts to go beyond it: Optimal algorithms for private and robust estimation We give the first polynomial-time algorithms for privately and robustly estimating a Gaussian distribution with optimal dependence on the dimension in the sample complexity. This adds the fundamental problem of private statistical estimation to a growing list of problems for which SDPs are optimal among polynomial-time algorithms. Limitations of SDPs: Given independent standard Gaussian points in dimension $d$, for what values of $(n, d)$ does there exist with high probability an origin-symmetric ellipsoid that simultaneously passes through all of the points? Based on strong numerical evidence, it was conjectured that the ellipsoid fitting problem transitions from feasible to infeasible as the number of points $n$ increases, with a sharp threshold at $n \sim d^2/4$; we resolve this conjecture up to logarithmic factors. A corollary of this result is that a canonical SDP-based algorithm fails to successfully solve inference problems involving low-rank matrix decompositions, independent component analysis, and principal component analysis. New algorithms for discrepancy certification: We initiate the study of the algorithmic problem of certifying lower bounds on the discrepancy of random matrices, which has connections to conjecturally-hard average-case problems such as negatively-spiked PCA, the number-balancing problem and refuting random constraint satisfaction problems. We give the first polynomial-time algorithms with non-trivial guarantees, strictly outperforming a canonical SDP-based algorithm. Our algorithms are among the first to harness the power of lattice basis reduction techniques to solve statistical estimation problems. Fast spectral algorithms: We study the algorithmic problem of estimating the mean of a heavy-tailed random vector in high dimensions given i.i.d.\ samples. The goal is to design an efficient estimator that attains the optimal sub-gaussian error bound, only assuming that the random vector has bounded mean and covariance. Polynomial-time solutions to this problem were known but have high runtime due to the use of SDPs. We give a fast spectral algorithm for this problem that also has optimal statistical performance. Our work establishes yet another fundamental statistical estimation problem for which the power of SDPs is matched by simpler, more practical algorithms.

Description

Other Available Sources

Keywords

Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories