Publication: Statistical Methods for Multi-Omics Data
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
The volume and diversity of biological data is growing tremendously as a result of novel technologies, decreasing costs, and a widespread belief that integrating many types of large-scale biological data can help elucidate underlying mechanisms of complex disease. These multi-omics data can originate from the genome, epigenome, transcriptome, metabolome, and more. The scale and complexity of these data require statistical attention to make their analyses accurate, powerful, computationally-feasible, and interpretable. This dissertation discusses three different issues in the analysis of multi-omics data. Chapter 1 presents a joint test for DNA methylation-environment interaction. Combining epigenetic and environmental data can allow for better detection of genetic determinants of disease, and our test has optimal or nearly optimal power over the standard marginal test for the effect of DNA methylation and the standard interaction test for methylation-environment interaction. Chapter 2 explores the impact of a wide variety of imputation methods for missing data in canonical correlation analysis. Canonical correlation analysis is particularly useful in the multi-omics sphere because it explores the association between two multivariate sets of variables such as SNPs and gene expression. Chapter 3 presents rescaled LD Score regression (LDSC), a method for estimating the contribution of common variants to the variance of a trait using summary statistics from a genome-wide association study (GWAS) that is not ordinary least squares regression. This is an important extension of standard LDSC for use when phenotypes are binary or censored survival traits and the GWAS analysis is logistic regression or Cox proportional hazards regression. In summary, this dissertation tackles three diverse problems in the analysis of multi-omics data, each of which has the goal of elucidating underlying biological mechanisms of disease.