Statistical Methods for Population Structure Discovery in Meta-Analyzed 'Omics Studies

Ma, Siyuan

View/Open

MA-DISSERTATION-2019.pdf (8.730Mb)

Author

Ma, Siyuan

Metadata

Show full item record

Citation

Ma, Siyuan. 2019. Statistical Methods for Population Structure Discovery in Meta-Analyzed 'Omics Studies. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Abstract

Applications of molecular methods in public health require accuracy and replication of findings across multiple complementary studies. Statistical methods for meta-analysis are thus increasingly important for synthesizing and validating 'omics markers of relevance at population scale. In this dissertation I present several such methods applicable for human genomic molecular epidemiology and the human microbiome.
Chapter 1 investigates the use of transcriptomic biomarkers generally, and in colorectal cancer (CRC) specifically, when meta-analyzed to provide reproducible tumor subtypes. We propose a new method for identifying consistent transcriptome population structures, and when applied to CRC, our findings contradict previous reports of discrete CRC tumor subtypes. Instead, we identify a more robust and biologically interpretable model of continuous population "gradients" across transcriptomes.
Chapter 2 extends, formalizes, and applies this methodology in the new context of microbial community profiles. Unlike human transcriptional biomarkers, microbiome profiles present statistical challenges in their compositionality and sparsity, making meta-analysis particularly challenging. We provide a new model for microbial profile meta-analysis, validate it using synthetic and population data, and use it to identify human gut microbial patterns consistently associated with inflammatory bowel disease.
Lastly, in Chapter 3, we present a hierarchical model of microbial community count observations, suitable for simulation of such data at population scale. Our model has specialized components targeting characteristics unique to microbiome data, including sparsity, joint effects of biological and sequencing variation, and ecological feature dependencies, and is capable of simulating mock microbial counts that recapitulate the population structures in training template communities.
Together, the methods presented in this dissertation provide a robust framework for construction and validation of meta-analyzed population scale molecular profiles, and all have been theoretically justified, quantitatively evaluated, applied to specific biological problems, and disseminated to the research community as open-source software implementations. We hope that these methods and findings will be of broad applicability in human transcriptional and microbial epidemiology, and will inform future population study designs and analysis practices.

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA

Citable link to this page

http://nrs.harvard.edu/urn-3:HUL.InstRepos:42013132

Collections

FAS Theses and Dissertations [6136]

Contact administrator regarding this item (to report mistakes or request changes)