Publication: Quantitative Methods for Analyzing Structure in Genomes, Self-Assembly, and Random Matrices
No Thumbnail Available
Date
2016-05-20
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Huntley, Miriam. 2016. Quantitative Methods for Analyzing Structure in Genomes, Self-Assembly, and Random Matrices. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
Research Data
Abstract
This dissertation presents my graduate work analyzing biological structure. My research spans three different areas, which I discuss in turn. First I present my work studying how the genome folds. The three-dimensional structure of the genome inside of the nucleus is a matter of great biological importance, yet there are many questions about just how the genetic material is folded up. To probe this, we performed Hi-C experiments to create the highest resolution dataset (to date) of genome-wide contacts in the nucleus. Analysis of this data uncovered an array of fundamental structures in the folded genome. We discovered approximately 10,000 loops in the human genome, which each bring a pair of loci far apart along the DNA strand (up to millions of basepairs away) into close proximity. We found that contiguous stretches of DNA are segregated into self-associating contact domains. These domains are associated with distinct patterns of histone marks and segregate into six nuclear subcompartments. We found that these spatial structures are deeply connected to the regulation of the genome and cell function, suggesting that understanding and characterizing the 3D structure of the genome is crucial for a complete description of biology. Second, I present my work on self-assembly. Many biological structures are formed via `bottom-up' assembly, wherein a collection of subunits assemble into a complex arrangement. In this work we developed a theory which predicts the fundamental complexity limits for these types of systems. Using an information theory framework, we calculated the capacity, the maximum amount of information that can be encoded and decoded in systems of specific interactions, giving possible future directions for improvements in experimental realizations of self-assembly. Lastly, I present work examining the statistical structure of noisy data. Experimental datasets are a combination of signal and randomness, and data analysis algorithms, such as Principal Component Analysis (PCA), all seek to extract the signal. We used random matrix theory to demonstrate that even in situations where the dataset contains too much noise for PCA to be successful, the signal can be still be recovered with the use of prior information.
Description
Other Available Sources
Keywords
Biophysics, General, Biology, Bioinformatics, Mathematics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service