Publication:
Statistical Methods for Sequence-Based Microbial Community Assays

No Thumbnail Available

Date

2017-09-07

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Schwager, Emma Holdrich. 2017. Statistical Methods for Sequence-Based Microbial Community Assays. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Research Data

Abstract

The human microbiome comprises the totality of micro-organisms residing in and on the human body. Of late, it has been the subject of much intensive research into how this microbial community is involved in diseases, either directly (as in the case of periodontitis or bacterial vaginosis) or indirectly (as in the case of obesity or type II diabetes). The implication of microbial involvement in these and other diseases suggests that the microbiome can be used as a therapeutic agent, because unlike the human genome, it is both measurable and plastic. Studies on the microbiome typically collect data using sequencing methodologies, such as 16S rRNA gene sequencing (which sequences a single gene universal among bacteria), whole metagenome shotgun sequencing (which sequences all DNA in a given sample), or metatranscriptomic sequencing (which sequences all RNA in a given sample). The abundance data generated by these technologies have unique characteristics which must be accounted for in any statistical analysis. Particularly, microbiome data tend to be highly zero-inflated, often having 80% or more zeros; high-dimensional, often having orders of magnitude more features than samples; and compositional because the abundances are constrained by the total number of sequencing reads in a sample. In this dissertation, I address these three challenges in two key areas of microbiome analysis: detecting microbial interactions and pre-computing study power. I develop a Bayesian correlation-detection method appropriate for relative abundance data to explore ecological interactions between taxa. I use this method to elucidate the community ecological structure in the human microbiome at the species level, laying the foundation for further understanding of the behaviors of communities in the host and how they respond to perturbations. I also use simulation to provide a set of guidelines for practitioners performing pre-study power analysis in microbial epidemiology.

Description

Other Available Sources

Keywords

Microbiome studies

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories