Publication: Methods for Estimating Hidden Structure and Network Transitions in Genomics
No Thumbnail Available
Date
2017-05-04
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Research Data
Abstract
The explosion of data arising from advances in high throughput sequencing has allowed scientists to study genomics in far greater detail. However, this high resolution picture of cells often makes it difficult to see the higher level functions and features in the biology that lead to phenotypic outcomes. Identifying the structure hidden in genomic data is critical to separating the data patterns that we consider artifactual, such as batch effect or population structure, from that which we consider signal.
In chapter 2, we address a problem in estimating genetic similarity more accurately, which is important for inferring population structure in a sample. We exploit the relative informativeness of rare variants to more precisely inform our measurement. We then show that this precision can be used to easily test assumptions of homogeneity and identify cryptically related individuals. In chapter 3, we propose a method in transcriptomics that similarly identifies and controls for unwanted latent structure. Batch effect has been widely described in the literature, but we specifically consider the impact of batch on coexpression, a concept critical to gene network inference. Our method involves a regression approach for controlling for this effect by estimating a reduced number of parameters that describe the coexpression matrix as a function of the covariates. Finally, in chapter 4, we demonstrate an approach for finding transcription factor drivers of cell state transitions using gene regulatory network (GRN) models. The best way to characterize the rewiring that occurs in GRNs between phenotypic states is unclear, and gold-standards are nearly non-existent. We propose an approach that estimates a matrix describing the change in network adjacency matrix between two states and demonstrate it by applying it to four separate studies of COPD.
Together, these chapters present three contributions to our understanding of genomic data. Fundamentally, each method described here estimates specific types of hidden underlying structure in complex, high dimensional settings. In each context, estimating this structure allows us to better understand how genomic features leads to phenotype.
Description
Other Available Sources
Keywords
Biology, Biostatistics, Statistics, Biology, Bioinformatics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service