Publication:

Bayesian Statistical Framework for High-Dimensional Count Data and its Application in Microbiome Studies

Loading...
Thumbnail Image

Date

2017-05-10

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Abstract

High-dimensional count data arising from multinomial sampling is ubiquitous in microbiome studies. This dissertation aims to develop flexible Bayesian framework to model high-dimensional count data, which provides reliable and automatic inference for biological questions in microbiome studies. In Chapter 1, we present a nonparametric Bayesian model for dependent distributions to depict simultaneously multiple species sampling sequences. Our marginal prior for each sampling sequence is a normalized Gamma process and the dependence between the sequences is represented by a low-dimensional latent factors. The resulting posterior samples of model parameters can be used to evaluate uncertainty in analyses routinely applied in microbiome studies such as ordination. In Chapter 2, we extend the latent factor model in Chapter 1 to enable estimating of effect of covariates. We proved analytically and numerically that this augmented model is identifiable and it separates the effect of covariates and that of latent factors accurately. We provides techniques to transform model parameters to interpretable results. An application of this model on a longitudinal microbiome dataset illustrates the use of this model in microbiome studies. Chapter 3 focuses more on a bioinformatics tool that simulates realistic microbiome data and benchmarks statistical tools for microbiome studies. We model the count as over-dispersed Poisson outcome by a hierarchical lognormal distribution. We then propose a heuristic algorithm which generates data that resemble real microbiome data. A benchmark of a previously published method illustrates the simulated data provide accurate characterization of the method.

Description

Other Available Sources

Research Data

Keywords

Biology, Biostatistics, Biology, Bioinformatics, Statistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories