Statistical Methods for Population Structure Discovery in Meta-Analyzed 'Omics Studies
CitationMa, Siyuan. 2019. Statistical Methods for Population Structure Discovery in Meta-Analyzed 'Omics Studies. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
AbstractApplications of molecular methods in public health require accuracy and replication of findings across multiple complementary studies. Statistical methods for meta-analysis are thus increasingly important for synthesizing and validating 'omics markers of relevance at population scale. In this dissertation I present several such methods applicable for human genomic molecular epidemiology and the human microbiome.
Chapter 1 investigates the use of transcriptomic biomarkers generally, and in colorectal cancer (CRC) specifically, when meta-analyzed to provide reproducible tumor subtypes. We propose a new method for identifying consistent transcriptome population structures, and when applied to CRC, our findings contradict previous reports of discrete CRC tumor subtypes. Instead, we identify a more robust and biologically interpretable model of continuous population "gradients" across transcriptomes.
Chapter 2 extends, formalizes, and applies this methodology in the new context of microbial community profiles. Unlike human transcriptional biomarkers, microbiome profiles present statistical challenges in their compositionality and sparsity, making meta-analysis particularly challenging. We provide a new model for microbial profile meta-analysis, validate it using synthetic and population data, and use it to identify human gut microbial patterns consistently associated with inflammatory bowel disease.
Lastly, in Chapter 3, we present a hierarchical model of microbial community count observations, suitable for simulation of such data at population scale. Our model has specialized components targeting characteristics unique to microbiome data, including sparsity, joint effects of biological and sequencing variation, and ecological feature dependencies, and is capable of simulating mock microbial counts that recapitulate the population structures in training template communities.
Together, the methods presented in this dissertation provide a robust framework for construction and validation of meta-analyzed population scale molecular profiles, and all have been theoretically justified, quantitatively evaluated, applied to specific biological problems, and disseminated to the research community as open-source software implementations. We hope that these methods and findings will be of broad applicability in human transcriptional and microbial epidemiology, and will inform future population study designs and analysis practices.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:42013132
- FAS Theses and Dissertations