Publication:

Integrative Statistical Methods for Multi-omics Data

Loading...
Thumbnail Image

Date

2020-11-23

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Feng, Helian. 2020. Integrative Statistical Methods for Multi-omics Data. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Multi-omics data including the genome, epigenome, transcriptome, metabolome, etc., each provides information for a unique aspect of human health. Integrating them together provides new aspects that would not be available when study each of them independently and thus helps elucidate the underlying mechanisms of complex disease. The heterogeneous nature of multi-omics data makes it informative while also makes accurately and efficiently integrating and interpreting multi-omics data together statistically challenging. Transcriptome-wide Association Study (TWAS) tests are one class of statistical methods that combines different ‘omics data types using hypothesized biological relationships—in this case, the “central dogma” that DNA encodes mRNA which is translated into proteins that then influence disease processes. TWAS statistics test the association between transcript expression levels and disease risk by first using eQTL reference data to build multi-marker predictors of expression, and then testing the association between the genetically predicted expression and disease risk in a large GWAS. This dissertation focuses on improving the power and expand the utility space of TWAS. As originally proposed, TWAS was performed with eQTL data of single-tissue expression and GWAS data (individual/ summary level) of single-trait. Chapter 1 presents a new TWAS pipeline that integrates data on the genetic regulation of expression levels across multiple tissues. Our pipeline has optimal power over standard single tissue tests by generating cross-tissue expression features using sparse canonical correlation analysis (sCCA) and then combining evidence for expression-outcome association across cross-tissue and single-tissue features using the aggregate Cauchy association test. Chapter 2 extends the multi-trait genetic association methods for single SNPs to multi-SNP TWAS tests and evaluates the performances of several such methods under simulation and real data application. Joint TWAS with multiple phenotypes improves the power of detecting genes associated with phenotypes regulated through similar pathways. Chapter 3 combines the methods in the first two chapters and proposes a pipeline to perform cross-cancer, cross-tissue TWAS analysis. We implemented the multi-trait and cross-tissue TWAS methods to conduct TWAS tests for association between 11 separated cancers and predicted gene expression in each of 49 GTEx tissues. The test results demonstrate the effectiveness of the pipeline in improving power and detecting functional relevant genes. In summary, this dissertation extends the dimension of TWAS in two areas and finally integrates the two into an effective pipeline in the analysis of multi-omics data, which has the goal of elucidating underlying biological mechanisms of disease.

Description

Other Available Sources

Research Data

Keywords

Genetic epidemiology, GWAS, Multi-omics Data, Multivariate data integration, Statistical Genetics, Transcriptome-wide Association Study, Biostatistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories