Publication:
Methods on Model Selection: Bayes Factor Approximation and False Discovery Rate Control

No Thumbnail Available

Date

2020-05-13

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Dai, Chenguang. 2020. Methods on Model Selection: Bayes Factor Approximation and False Discovery Rate Control. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Research Data

Abstract

Model selection can be an art. In many scientific fields including genetics, climate sciences and social sciences, a proper model can simplify the computation in the analysis as well as increase the interpretability of the result. In this dissertation, we articulate two methods, under two different guiding principles, to facilitate model selection. The first method concerns the computational challenge in Bayesian model comparison. We show that the Bayes factor can be approximated using the Wang-Landau algorithm, based on a mixture formulation between the posterior distribution and a user-defined surrogate distribution. The proposed Wang-Landau mixture method is applicable as long as an effective Markov kernel invariant to the posterior is available. Further refinements are carefully discussed, including accelerating the convergence using the momentum method, and facilitating global jumps between the posterior and the surrogate using the Multiple-try Metropolis. The second method concerns a desired Frequentist property in feature selection. Specifically, we form a proper statistic via data splitting to rank the importance of each feature. The statistic enjoys a useful property, that is, it is symmetric about 0 for null features, and relatively large for relevant features. We show that by carefully choosing a data-dependent cutoff, we can achieve asymptotic false discovery rate control under proper conditions. The proposed method is free of calculating p-values, and is applicable to a wide class of statistical models including the linear model, the generalized linear model, and the Gaussian graphical model.

Description

Other Available Sources

Keywords

Model selection, Bayes Factor, False discovery rate,

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories