Publication: Problems in Variable Selection: False Discovery Rate Control and Variational Inference
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Variable selection plays a key role in modern high-dimensional statistics. This dissertation provides a comprehensive survey of theory and methods developed by the author and collaborators, with a specific focus on two areas: false discovery rate (FDR) control and Bayesian variable selection. In the domain of FDR control, we introduce a data-splitting method to asymptotically control the FDR while maintaining a high power. Furthermore, a Multiple Data Splitting (MDS) method is proposed to stabilize the selection result and boost the power. In Chapter 1, we apply both DS and MDS to the generalized linear models, which appear to be more robust in finite-sample cases compared to existing methods. Chapter 2 provides some following discussions regarding the proposed method and Chapter 3 compares the power of the proposed method with two existing methods: the model-X knockoff and Gaussian mirror. In terms of the Bayesian variable selection, the posterior is typically high-dimensional and analytically intractable. Exact inference methods based on sampling, such as Markov Chain Monte Carlo (MCMC), can encounter challenges related to mixing. Variational inference has emerged as an attractive alternative for approximating the posterior distribution. By recasting the sampling problem as an optimization problem, variational inference can significantly reduce computational time. In chapter 4, we apply the variational inference to group variable selection with spike-and-slab prior and propose an efficient parameter-expanded coordinate-ascent algorithm to obtain the optimal variational Bayes approximation. The proposed method has demonstrated good performance in both simulations and a real data example.