Publication:

Statistical Methods for Multi-Omics Data

Loading...
Thumbnail Image

Date

2018-05-09

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Slade, Emily Marie. 2018. Statistical Methods for Multi-Omics Data. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Abstract

The volume and diversity of biological data is growing tremendously as a result of novel technologies, decreasing costs, and a widespread belief that integrating many types of large-scale biological data can help elucidate underlying mechanisms of complex disease. These multi-omics data can originate from the genome, epigenome, transcriptome, metabolome, and more. The scale and complexity of these data require statistical attention to make their analyses accurate, powerful, computationally-feasible, and interpretable. This dissertation discusses three different issues in the analysis of multi-omics data. Chapter 1 presents a joint test for DNA methylation-environment interaction. Combining epigenetic and environmental data can allow for better detection of genetic determinants of disease, and our test has optimal or nearly optimal power over the standard marginal test for the effect of DNA methylation and the standard interaction test for methylation-environment interaction. Chapter 2 explores the impact of a wide variety of imputation methods for missing data in canonical correlation analysis. Canonical correlation analysis is particularly useful in the multi-omics sphere because it explores the association between two multivariate sets of variables such as SNPs and gene expression. Chapter 3 presents rescaled LD Score regression (LDSC), a method for estimating the contribution of common variants to the variance of a trait using summary statistics from a genome-wide association study (GWAS) that is not ordinary least squares regression. This is an important extension of standard LDSC for use when phenotypes are binary or censored survival traits and the GWAS analysis is logistic regression or Cox proportional hazards regression. In summary, this dissertation tackles three diverse problems in the analysis of multi-omics data, each of which has the goal of elucidating underlying biological mechanisms of disease.

Description

Other Available Sources

Research Data

Keywords

multi-omics data, DNA methylation, missing data, canonical correlation analysis, LD score regression

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories