Publication: Statistical Methods for Population Structure Discovery in Meta-Analyzed 'Omics Studies
No Thumbnail Available
Open/View Files
Date
2019-09-12
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Ma, Siyuan. 2019. Statistical Methods for Population Structure Discovery in Meta-Analyzed 'Omics Studies. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
Research Data
Abstract
Applications of molecular methods in public health require accuracy and replication of findings across multiple complementary studies. Statistical methods for meta-analysis are thus increasingly important for synthesizing and validating 'omics markers of relevance at population scale. In this dissertation I present several such methods applicable for human genomic molecular epidemiology and the human microbiome.
Chapter 1 investigates the use of transcriptomic biomarkers generally, and in colorectal cancer (CRC) specifically, when meta-analyzed to provide reproducible tumor subtypes. We propose a new method for identifying consistent transcriptome population structures, and when applied to CRC, our findings contradict previous reports of discrete CRC tumor subtypes. Instead, we identify a more robust and biologically interpretable model of continuous population "gradients" across transcriptomes.
Chapter 2 extends, formalizes, and applies this methodology in the new context of microbial community profiles. Unlike human transcriptional biomarkers, microbiome profiles present statistical challenges in their compositionality and sparsity, making meta-analysis particularly challenging. We provide a new model for microbial profile meta-analysis, validate it using synthetic and population data, and use it to identify human gut microbial patterns consistently associated with inflammatory bowel disease.
Lastly, in Chapter 3, we present a hierarchical model of microbial community count observations, suitable for simulation of such data at population scale. Our model has specialized components targeting characteristics unique to microbiome data, including sparsity, joint effects of biological and sequencing variation, and ecological feature dependencies, and is capable of simulating mock microbial counts that recapitulate the population structures in training template communities.
Together, the methods presented in this dissertation provide a robust framework for construction and validation of meta-analyzed population scale molecular profiles, and all have been theoretically justified, quantitatively evaluated, applied to specific biological problems, and disseminated to the research community as open-source software implementations. We hope that these methods and findings will be of broad applicability in human transcriptional and microbial epidemiology, and will inform future population study designs and analysis practices.
Description
Other Available Sources
Keywords
Microbiome, Meta-analysis, Batch effect, Unsupervised analysis
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service