Publication:

Statistical methods for transcription factor footprinting in 3D genome assays

Loading...
Thumbnail Image

Date

2025-06-05

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Sept, Corriene Elinor. 2025. Statistical methods for transcription factor footprinting in 3D genome assays. Doctoral Dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Chromatin loops are drivers of gene regulation, bringing distal enhancers into close proximity with their target genes. While a subset of these long-range contacts are mediated by the architectural factors CTCF and cohesin, the mechanisms underlying the majority of these regulatory contacts remains unclear. While transcription factors and other proteins are known to be contributors to DNA looping, no methods currently exist for simultaneously profiling 3D genome structure and the DNA-binding proteins involved. This presents a significant limitation for evaluating transcription factor impact on 3D contacts and constructing high-resolution protein occupancy maps of regulatory regions. This dissertation seeks to bridge this gap by (1) developing statistical methods that quantitatively assess DNA-binding protein occupancy in 3D genome assays and (2) leveraging these new methods to (i) investigate the relationship between protein-binding and genome architecture and (ii) provide high-resolution maps of DNA-binding proteins at enhancers and promoters. Chapter 1 proposes a statistical method that determines CTCF binding in CTCF MNase HiChIP assays. MNase, beyond its use in profiling genome structure, has also been used to determine transcription factor and nucleosome occupancy at high-resolution in assays like CUT&RUN and MNase-seq due to its endo-exonuclease activity, where it cuts regions unprotected by proteins and chews back the fragments until it reaches protein-protected DNA. This enables inference of both protein size and location, since nucleosomes protect more than twice the DNA of a transcription factor, thus yielding DNA fragments of significantly longer length. We leverage short, transcription factor (TF) protected fragments to pinpoint locations of CTCF binding at base-pair resolution. We then use TF-protected fragments at CTCF binding sites to implement a novel fragment-level view of CTCF-mediated chromatin looping dynamics. With this approach, we determine that fully extruded chromatin loops between convergent CTCF-bound sites are rare genome-wide and that, in addition to CTCF, active regulatory elements hinder cohesin-mediated loop extrusion. This supports a model by which the partially extruded chromatin loop can enable distal enhancer-promoter contacts. Chapter 2 expands upon the method developed in Chapter 1 by broadly mapping locations of DNA-binding transcription factors in Micro-C. Unlike Chapter 1, Chapter 2 does not rely on a ChIP step to profile one protein’s occupancy. Not being limited to just investigating one protein, CTCF, facilitates additional novel insights into protein occupancy and their relation to 3D contacts and transcription. We find that expression level is tightly linked with the size of the nucleosome depleted region at the TSS and the presence of a large TF complex at the promoter, immediately upstream of the TSS, with unexpressed genes exhibiting a TSS obstructed by a nucleosome and a lack of TF binding. Furthermore, the TF-sized proteins upstream of the TSS at expressed genes appear to facilitate long-range, cohesin-independent looping contacts, which may partially explain why gene expression is largely maintained when cohesin is degraded. Further investigation into whether specific TFs may enable these long-range cohesin-independent contacts identified transcription factor motif families such as the KLF/SP and NF-Y motif families as likely candidates for cohesin-independent looping factors. Chapter 3 applies the approach developed in Chapter 2 to gain a high-resolution view of the MYC oncogene and its distal cell-type specific enhancers. This analysis identifies MYC distal enhancers as regions highly occupied by TF-enhancer assemblies, which depend on RNA for their coalescence. This chapter reveals the cooperation between TF binding and RNA required for gene regulation and genome structure, and presents a framework for developing high-resolution maps of regulatory architecture.

Description

Other Available Sources

Research Data

Keywords

3D genome, Epigenomics, Gene regulation, Protein footprinting, Genetics, Bioinformatics, Biostatistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories