Publication: Overcoming the Curse: Statistical Theory and Methods in High Dimensions
No Thumbnail Available
Open/View Files
Date
2021-05-07
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Wang, Wenshuo. 2021. Overcoming the Curse: Statistical Theory and Methods in High Dimensions. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
Battling the curse of dimensionality is a common theme in modern statistics. This dissertation contributes to this effort by surveying theory and methods in high-dimensinonal statistics developed by the author and collaborators, specifically in the areas of regression and sampling. In regression analysis, statisticians try to relate a response variable Y to a set of potential ex- planatory variables X = (X1,... , Xp), and start by trying to identify variables that contribute to this relationship. In statistical terms, this goal can be posed as trying to identify Xj’s upon which Y is conditionally dependent. Sometimes it is of value to simultaneously test for each j, which is more commonly known as variable selection. Classic techniques fail to accomplish these tasks when the number of features is too large. To address this challenge, the conditional randomization test and model-X knockoffs are two recently proposed methods that respectively per- form conditional independence testing and variable selection using knowledge of X’s distribution. Chapter 1 proposes the first generic knockoff sampling algorithm, which is the point of departure of implementing knockoffs, and investigates its optimality in terms of time complexity. Chapter 2 examines the power of the conditional randomization test and knockoffs in the high-dimensional regime in which the ratio of the dimension p and the sample size n converge to a positive constant. In sampling, statisticians try to draw samples from a target distribution. It is quite demanding to obtain representative samples from a high-dimensional distribution while restricting the sample size at a reasonable level. Sequential Monte Carlo and sequential quasi-Monte Carlo methods are powerful computational tools on this front. Chapter 3 studies these methods and derives new convergence rates in terms of the number of samples n.
Description
Other Available Sources
Keywords
Statistics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service