Publication: Expediting Scientific Discoveries With Bayesian Statistical Methods
No Thumbnail Available
Date
2017-05-05
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Research Data
Abstract
The topic of this thesis is developing Bayesian statistical methodology aimed at solving scientific problems and thoroughly studying relevant statistical computational methods. There are four chapters in total. The first three chapters are motivated from a fundamental biological process and the last one is about evaluating Bayesian computational algorithms that utilize modern parallelisable computing architecture. Each of the four chapters is self-contained and is in the format of a journal paper, with technical details given in the corresponding Appendix.
In chapter one, we study the molecular mechanism underlying the protein transportation process through data obtained from single-molecule experiments that use fluorescence imaging to track molecular behaviors. The experimental data consist of hundreds of stochastic time traces from the fluorescence recordings of the experimental system. We introduce a Bayesian hierarchical model on top of hidden Markov models (HMMs) to analyze these data and use the statistical results to answer the biological questions. Besides resolving the biological puzzles and delineating the regulating roles of different molecular complexes, our statistical results enable us to propose a more detailed mechanism for the late stages of the protein targeting process.
In chapter two, we introduce a a Matlab package for Bayesian analysis of ensembles of single-molecule fluorescence traces from replicated experiments. The proposed Bayesian hierarchical hidden Markov model in chapter one provides a principled way of extracting the common dynamics of observed traces from experimental replicates. Numerical examples demonstrate the wide applicability of the Matlab package: traces with low signal-to-noise ratios, traces with rare events, and heterogeneous traces with unknown number of hidden states and different numbers of observations.
In chapter three, we propose a consistent method of estimating the order of hidden Markov models based on the marginal likelihood, which is obtained by integrating out both the parameters and hidden states. We prove the consistency of the marginal likelihood method under weak regularity conditions that are satisfied by a broad class of models. An R package is built for practitioners to apply the proposed methodology. Comprehensive simulation studies illustrate the comparison of the proposed method with the currently most widely adopted method, the Bayesian information criterion (BIC), demonstrating the effectiveness of the marginal likelihood method.
In chapter four, we study parallelisable Markov chain Monte Carlo algorithms. Parallelisable Markov chain Monte Carlo algorithms generate multiple proposals and parallelise the evaluations of the likelihood functions on different cores at each iteration. We give a simple-to-use criterion, the generalized effective sample size, for evaluations and comparisons of general parallelisable Markov chain Monte Carlo algorithms. The formula is easy to implement using moment estimators.
The thesis concludes with brief discussions of several open interesting questions related to the materials in chapters 1 through 4.
Description
Other Available Sources
Keywords
Statistics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service