Statistical Methods for Sequence-Based Microbial Community Assays
Schwager, Emma Holdrich
MetadataShow full item record
CitationSchwager, Emma Holdrich. 2017. Statistical Methods for Sequence-Based Microbial Community Assays. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
AbstractThe human microbiome comprises the totality of micro-organisms residing in and on the human body. Of late, it has been the subject of much intensive research into how this microbial community is involved in diseases, either directly (as in the case of periodontitis or bacterial vaginosis) or indirectly (as in the case of obesity or type II diabetes). The implication of microbial involvement in these and other diseases suggests that the microbiome can be used as a therapeutic agent, because unlike the human genome, it is both measurable and plastic. Studies on the microbiome typically collect data using sequencing methodologies, such as 16S rRNA gene sequencing (which sequences a single gene universal among bacteria), whole metagenome shotgun sequencing (which sequences all DNA in a given sample), or metatranscriptomic sequencing (which sequences all RNA in a given sample). The abundance data generated by these technologies have unique characteristics which must be accounted for in any statistical analysis. Particularly, microbiome data tend to be highly zero-inflated, often having 80% or more zeros; high-dimensional, often having orders of magnitude more features than samples; and compositional because the abundances are constrained by the total number of sequencing reads in a sample. In this dissertation, I address these three challenges in two key areas of microbiome analysis: detecting microbial interactions and pre-computing study power. I develop a Bayesian correlation-detection method appropriate for relative abundance data to explore ecological interactions between taxa. I use this method to elucidate the community ecological structure in the human microbiome at the species level, laying the foundation for further understanding of the behaviors of communities in the host and how they respond to perturbations. I also use simulation to provide a set of guidelines for practitioners performing pre-study power analysis in microbial epidemiology.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:42061500
- FAS Theses and Dissertations