Publication:
Modern Statistical Methods for Genetics and Genomic Studies

No Thumbnail Available

Date

2021-03-05

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Li, Xihao. 2021. Modern Statistical Methods for Genetics and Genomic Studies. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

Recent scientific advances in genetics and genomic studies have enabled the characterization and prediction of functional genomic elements across the human genome, including biological evidence which assesses different aspects of functional consequences of genetic variants through a diverse set of in silico functional annotations; and genetic evidence which assesses how genetic variants are associated with complex phenotypes or traits from large-scale sequencing studies. In this dissertation, we present novel statistical methods that performs integrative analysis of data arising from these complementary lines of evidence to better understand the functional annotation landscape of coding and noncoding genetic variants and uncover the genetic architecture of human disease or traits. In Chapter 1, we propose Multi-dimensional Annotation Class Integrative Estimation (MACIE), an unsupervised multivariate mixed model framework capable of integrating annotations of diverse origin to assess multi-dimensional functional roles for both coding and noncoding variants. MACIE effectively summarizes these diverse and complementary functional annotations into measures that can predict the multi-faceted biological functions of any given genetic variant, and thus provides richer and more interpretable information than existing one-dimensional scores in the presence of multiple aspects of functionality. Applied to a variety of independent coding and non-coding datasets, MACIE demonstrates powerful and robust performance in discriminating between functional and non-functional variants. We also show an application of MACIE to fine-mapping using lipids GWAS summary statistics data from the European Network for Genetic and Genomic Epidemiology Consortium. Large-scale whole genome sequencing (WGS) studies have enabled the analysis of rare variants (RVs) associated with complex phenotypes. Commonly used RV association tests (RVATs) have limited scope to leverage variant functions. In Chapter 2, we propose STAAR (variant-Set Test for Association using Annotation infoRmation), a scalable and powerful RVAT method that effectively incorporates both variant categories and multiple complementary annotations using a dynamic weighting scheme. STAAR accounts for population structure and relatedness, and is scalable for analyzing very large cohort and biobank WGS studies of continuous and dichotomous traits. We apply STAAR to identify RVs associated with four lipid traits using data from the Trans-Omics for Precision Medicine (TOPMed) program. We discover and replicate novel RV associations, including disruptive missense RVs of NPC1L1 and an intergenic region near APOC1P1 associated with low-density lipoprotein cholesterol. Meta-analysis of WGS studies has provided an exciting solution to leverage large sample sizes for the discovery of coding and noncoding RVs associated with complex human traits. Existing RV meta-analysis approaches are not scalable when applied to WGS data due to the very large number of RVs whose summary-level information needs to be stored and shared. In Chapter 3, we extend the method in Chapter 2 and propose MetaSTAAR as a powerful and resource-efficient RV meta-analysis framework scalable to large cohort and biobank WGS studies with hundreds of millions of RVs across the genome, while accounting for relatedness and population structure for both quantitative and dichotomous traits. Through meta-analysis of four lipid traits from 14 studies of the TOPMed program, we demonstrate that MetaSTAAR performed resource-efficient RV meta-analysis at scale and identified several conditionally significant RV associations with lipids.

Description

Other Available Sources

Keywords

Functional Annotation, Meta-Analysis, Rare Variants, Statistical Genetics, Whole-Genome Sequencing, Biostatistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories