Publication:
Statistical Methods for Population Structure Discovery in Meta-Analyzed 'Omics Studies

No Thumbnail Available

Date

2019-09-12

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Ma, Siyuan. 2019. Statistical Methods for Population Structure Discovery in Meta-Analyzed 'Omics Studies. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Research Data

Abstract

Applications of molecular methods in public health require accuracy and replication of findings across multiple complementary studies. Statistical methods for meta-analysis are thus increasingly important for synthesizing and validating 'omics markers of relevance at population scale. In this dissertation I present several such methods applicable for human genomic molecular epidemiology and the human microbiome. Chapter 1 investigates the use of transcriptomic biomarkers generally, and in colorectal cancer (CRC) specifically, when meta-analyzed to provide reproducible tumor subtypes. We propose a new method for identifying consistent transcriptome population structures, and when applied to CRC, our findings contradict previous reports of discrete CRC tumor subtypes. Instead, we identify a more robust and biologically interpretable model of continuous population "gradients" across transcriptomes. Chapter 2 extends, formalizes, and applies this methodology in the new context of microbial community profiles. Unlike human transcriptional biomarkers, microbiome profiles present statistical challenges in their compositionality and sparsity, making meta-analysis particularly challenging. We provide a new model for microbial profile meta-analysis, validate it using synthetic and population data, and use it to identify human gut microbial patterns consistently associated with inflammatory bowel disease. Lastly, in Chapter 3, we present a hierarchical model of microbial community count observations, suitable for simulation of such data at population scale. Our model has specialized components targeting characteristics unique to microbiome data, including sparsity, joint effects of biological and sequencing variation, and ecological feature dependencies, and is capable of simulating mock microbial counts that recapitulate the population structures in training template communities. Together, the methods presented in this dissertation provide a robust framework for construction and validation of meta-analyzed population scale molecular profiles, and all have been theoretically justified, quantitatively evaluated, applied to specific biological problems, and disseminated to the research community as open-source software implementations. We hope that these methods and findings will be of broad applicability in human transcriptional and microbial epidemiology, and will inform future population study designs and analysis practices.

Description

Other Available Sources

Keywords

Microbiome, Meta-analysis, Batch effect, Unsupervised analysis

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories