Bayesian Statistical Framework for High-Dimensional Count Data and its Application in Microbiome Studies

Ren, Boyu

View/Open

REN-DISSERTATION-2017.pdf (11.20Mb)

Author

Ren, Boyu

Metadata

Show full item record

Abstract

High-dimensional count data arising from multinomial sampling is ubiquitous in microbiome studies. This dissertation aims to develop flexible Bayesian framework to model high-dimensional count data, which provides reliable and automatic inference for biological questions in microbiome studies.
In Chapter 1, we present a nonparametric Bayesian model for dependent distributions to depict simultaneously multiple species sampling sequences. Our marginal prior for each sampling sequence is a normalized Gamma process and the dependence between the sequences is represented by a low-dimensional latent factors. The resulting posterior samples of model parameters can be used to evaluate uncertainty in analyses routinely applied in microbiome studies such as ordination.
In Chapter 2, we extend the latent factor model in Chapter 1 to enable estimating of effect of covariates. We proved analytically and numerically that this augmented model is identifiable and it separates the effect of covariates and that of latent factors accurately. We provides techniques to transform model parameters to interpretable results. An application of this model on a longitudinal microbiome dataset illustrates the use of this model in microbiome studies.
Chapter 3 focuses more on a bioinformatics tool that simulates realistic microbiome data and benchmarks statistical tools for microbiome studies. We model the count as over-dispersed Poisson outcome by a hierarchical lognormal distribution. We then propose a heuristic algorithm which generates data that resemble real microbiome data. A benchmark of a previously published method illustrates the simulated data provide accurate characterization of the method.

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA

Citable link to this page

http://nrs.harvard.edu/urn-3:HUL.InstRepos:40046490

Collections

FAS Theses and Dissertations [6136]

Contact administrator regarding this item (to report mistakes or request changes)