Publication: Methods on Model Selection: Bayes Factor Approximation and False Discovery Rate Control
No Thumbnail Available
Open/View Files
Date
2020-05-13
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Dai, Chenguang. 2020. Methods on Model Selection: Bayes Factor Approximation and False Discovery Rate Control. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
Research Data
Abstract
Model selection can be an art. In many scientific fields including genetics, climate sciences and social sciences, a proper model can simplify the computation in the analysis as well as increase the interpretability of the result. In this dissertation, we articulate two methods, under two different guiding principles, to facilitate model selection.
The first method concerns the computational challenge in Bayesian model comparison. We show that the Bayes factor can be approximated using the Wang-Landau algorithm, based on a mixture formulation between the posterior distribution and a user-defined surrogate distribution. The proposed Wang-Landau mixture method is applicable as long as an effective Markov kernel invariant to the posterior is available. Further refinements are carefully discussed, including accelerating the convergence using the momentum method, and facilitating global jumps between the posterior and the surrogate using the Multiple-try Metropolis.
The second method concerns a desired Frequentist property in feature selection. Specifically, we form a proper statistic via data splitting to rank the importance of each feature. The statistic enjoys a useful property, that is, it is symmetric about 0 for null features, and relatively large for relevant features. We show that by carefully choosing a data-dependent cutoff, we can achieve asymptotic false discovery rate control under proper conditions. The proposed method is free of calculating p-values, and is applicable to a wide class of statistical models including the linear model, the generalized linear model, and the Gaussian graphical model.
Description
Other Available Sources
Keywords
Model selection, Bayes Factor, False discovery rate,
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service