Publication:
Methods for Estimating Hidden Structure and Network Transitions in Genomics

No Thumbnail Available

Date

2017-05-04

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Research Data

Abstract

The explosion of data arising from advances in high throughput sequencing has allowed scientists to study genomics in far greater detail. However, this high resolution picture of cells often makes it difficult to see the higher level functions and features in the biology that lead to phenotypic outcomes. Identifying the structure hidden in genomic data is critical to separating the data patterns that we consider artifactual, such as batch effect or population structure, from that which we consider signal. In chapter 2, we address a problem in estimating genetic similarity more accurately, which is important for inferring population structure in a sample. We exploit the relative informativeness of rare variants to more precisely inform our measurement. We then show that this precision can be used to easily test assumptions of homogeneity and identify cryptically related individuals. In chapter 3, we propose a method in transcriptomics that similarly identifies and controls for unwanted latent structure. Batch effect has been widely described in the literature, but we specifically consider the impact of batch on coexpression, a concept critical to gene network inference. Our method involves a regression approach for controlling for this effect by estimating a reduced number of parameters that describe the coexpression matrix as a function of the covariates. Finally, in chapter 4, we demonstrate an approach for finding transcription factor drivers of cell state transitions using gene regulatory network (GRN) models. The best way to characterize the rewiring that occurs in GRNs between phenotypic states is unclear, and gold-standards are nearly non-existent. We propose an approach that estimates a matrix describing the change in network adjacency matrix between two states and demonstrate it by applying it to four separate studies of COPD. Together, these chapters present three contributions to our understanding of genomic data. Fundamentally, each method described here estimates specific types of hidden underlying structure in complex, high dimensional settings. In each context, estimating this structure allows us to better understand how genomic features leads to phenotype.

Description

Other Available Sources

Keywords

Biology, Biostatistics, Statistics, Biology, Bioinformatics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories