Publication: Statistical methods for transcription factor footprinting in 3D genome assays
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Chromatin loops are drivers of gene regulation, bringing distal enhancers into close proximity with their target genes. While a subset of these long-range contacts are mediated by the architectural factors CTCF and cohesin, the mechanisms underlying the majority of these regulatory contacts remains unclear. While transcription factors and other proteins are known to be contributors to DNA looping, no methods currently exist for simultaneously profiling 3D genome structure and the DNA-binding proteins involved. This presents a significant limitation for evaluating transcription factor impact on 3D contacts and constructing high-resolution protein occupancy maps of regulatory regions. This dissertation seeks to bridge this gap by (1) developing statistical methods that quantitatively assess DNA-binding protein occupancy in 3D genome assays and (2) leveraging these new methods to (i) investigate the relationship between protein-binding and genome architecture and (ii) provide high-resolution maps of DNA-binding proteins at enhancers and promoters. Chapter 1 proposes a statistical method that determines CTCF binding in CTCF MNase HiChIP assays. MNase, beyond its use in profiling genome structure, has also been used to determine transcription factor and nucleosome occupancy at high-resolution in assays like CUT&RUN and MNase-seq due to its endo-exonuclease activity, where it cuts regions unprotected by proteins and chews back the fragments until it reaches protein-protected DNA. This enables inference of both protein size and location, since nucleosomes protect more than twice the DNA of a transcription factor, thus yielding DNA fragments of significantly longer length. We leverage short, transcription factor (TF) protected fragments to pinpoint locations of CTCF binding at base-pair resolution. We then use TF-protected fragments at CTCF binding sites to implement a novel fragment-level view of CTCF-mediated chromatin looping dynamics. With this approach, we determine that fully extruded chromatin loops between convergent CTCF-bound sites are rare genome-wide and that, in addition to CTCF, active regulatory elements hinder cohesin-mediated loop extrusion. This supports a model by which the partially extruded chromatin loop can enable distal enhancer-promoter contacts. Chapter 2 expands upon the method developed in Chapter 1 by broadly mapping locations of DNA-binding transcription factors in Micro-C. Unlike Chapter 1, Chapter 2 does not rely on a ChIP step to profile one protein’s occupancy. Not being limited to just investigating one protein, CTCF, facilitates additional novel insights into protein occupancy and their relation to 3D contacts and transcription. We find that expression level is tightly linked with the size of the nucleosome depleted region at the TSS and the presence of a large TF complex at the promoter, immediately upstream of the TSS, with unexpressed genes exhibiting a TSS obstructed by a nucleosome and a lack of TF binding. Furthermore, the TF-sized proteins upstream of the TSS at expressed genes appear to facilitate long-range, cohesin-independent looping contacts, which may partially explain why gene expression is largely maintained when cohesin is degraded. Further investigation into whether specific TFs may enable these long-range cohesin-independent contacts identified transcription factor motif families such as the KLF/SP and NF-Y motif families as likely candidates for cohesin-independent looping factors. Chapter 3 applies the approach developed in Chapter 2 to gain a high-resolution view of the MYC oncogene and its distal cell-type specific enhancers. This analysis identifies MYC distal enhancers as regions highly occupied by TF-enhancer assemblies, which depend on RNA for their coalescence. This chapter reveals the cooperation between TF binding and RNA required for gene regulation and genome structure, and presents a framework for developing high-resolution maps of regulatory architecture.