Publication:

Problems in Variable Selection: False Discovery Rate Control and Variational Inference

Loading...
Thumbnail Image

Date

2024-05-07

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Lin, Buyu. 2024. Problems in Variable Selection: False Discovery Rate Control and Variational Inference. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Variable selection plays a key role in modern high-dimensional statistics. This dissertation provides a comprehensive survey of theory and methods developed by the author and collaborators, with a specific focus on two areas: false discovery rate (FDR) control and Bayesian variable selection. In the domain of FDR control, we introduce a data-splitting method to asymptotically control the FDR while maintaining a high power. Furthermore, a Multiple Data Splitting (MDS) method is proposed to stabilize the selection result and boost the power. In Chapter 1, we apply both DS and MDS to the generalized linear models, which appear to be more robust in finite-sample cases compared to existing methods. Chapter 2 provides some following discussions regarding the proposed method and Chapter 3 compares the power of the proposed method with two existing methods: the model-X knockoff and Gaussian mirror. In terms of the Bayesian variable selection, the posterior is typically high-dimensional and analytically intractable. Exact inference methods based on sampling, such as Markov Chain Monte Carlo (MCMC), can encounter challenges related to mixing. Variational inference has emerged as an attractive alternative for approximating the posterior distribution. By recasting the sampling problem as an optimization problem, variational inference can significantly reduce computational time. In chapter 4, we apply the variational inference to group variable selection with spike-and-slab prior and propose an efficient parameter-expanded coordinate-ascent algorithm to obtain the optimal variational Bayes approximation. The proposed method has demonstrated good performance in both simulations and a real data example.

Description

Other Available Sources

Research Data

Keywords

Statistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories