Hidden Markov Models Predict Epigenetic Chromatin Domains
MetadataShow full item record
CitationLarson, Jessica. 2012. Hidden Markov Models Predict Epigenetic Chromatin Domains. Doctoral dissertation, Harvard University.
AbstractEpigenetics is an important layer of transcriptional control necessary for cell-type specific gene regulation. We developed computational methods to analyze the combinatorial effect and large-scale organizations of genome-wide distributions of epigenetic marks. Throughout this dissertation, we show that regions containing multiple genes with similar epigenetic patterns are found throughout the genome, suggesting the presence of several chromatin domains. In Chapter 1, we develop a hidden Markov model (HMM) for detecting the types and locations of epigenetic domains from multiple histone modifications. We use this method to analyze a published ChIP-seq dataset of five histone modification marks in mouse embryonic stem cells. We successfully detect domains of consistent epigenetic patterns from ChIP-seq data, providing new insights into the role of epigenetics in longrange gene regulation. In Chapter 2, we expand our model to investigate the genome-wide patterns of histone modifications in multiple human cell lines. We find that chromatin states can be used to accurately classify cell differentiation stage, and that three cancer cell lines can be classified as differentiated cells. We also found that genes whose chromatin states change dynamically in accordance with differentiation stage are not randomly distributed across the genome, but tend to be embedded in multi-gene chromatin domains. Moreover, many specialized gene clusters are associated with stably occupied domains. In the last chapter, we develop a more sophisticated, tiered HMM to include a domain structure in our chromatin annotation. We find that a model with three domains and five sub-states per domain best fits our data. Each state has a unique epigenetic pattern, while still staying true to its domain’s specific functional aspects and expression profiles. The majority of the genome (including most introns and intergenic regions) has low epigenetic signals and is assigned to the same domain. Our model outperforms current chromatin state models due to its increased domain coherency and interpretation.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:10087396
- FAS Theses and Dissertations