Publication: High-resolution dissection of cis-regulatory elements
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Gene expression is a highly regulated process that governs almost all aspects of life. Precise gene regulation guarantees that every cell selectively expresses a subset of all genes to perform specialized cellular functions. Therefore, dysregulated gene expression often leads to disease and is implicated in aging. A key step of eukaryotic gene regulation is transcriptional regulation, where protein factors such as transcription factors (TFs) and nucleosomes bind to cis-regulatory elements (CREs) in the genome to determine the transcription of target genes. Our understanding of gene regulation therefore heavily depends on our ability to observe the action of regulatory proteins on genomic DNA, and scientists have worked for decades building ever better technologies to measure protein-DNA interactions at regulatory elements. This dissertation presents technological advances that allows measurement of protein-DNA interaction at cis-regulatory elements with significantly improved cellular (cell-type/state- resolved), genomic (single base-pair), and molecular (single molecule) resolution.
First, we describe PRINT and seq2PRINT, which are computational methods for tracking transcription factor and nucleosome binding within cis-regulatory regions using single-cell ATAC-seq (scATAC-seq) data. PRINT detects footprints of DNA-binding proteins across spatial scales by accurately modeling enzymatic sequence bias and signal dispersion using machine learning. Using pseudo-bulked scATAC-seq, PRINT achieves cell-type- and cell-state-resolved TF and nucleosome footprint landscapes in systems with complex cell type composition and across hundreds of thousands of CREs genome-wide. Building upon PRINT, we describe seq2PRINT, which is a deep-learning model that uses local DNA sequence as the sole input to predict footprint patterns in the same locus. By extracting sequence features learned during training, seq2PRINT can accurately predict TF binding events with single base-pair resolution.
Aided by low-rank approximation (LoRA), we can scale up seq2PRINT to hundreds of samples or cell-type/states. With PRINT and seq2PRINT, we reveal complex dynamics of TF and nucleosome binding within CREs in human hematopoiesis with unprecedented cell state and genomic resolution. We show that many CREs display distinct combinations of TF binding across cell types undetectable in traditional accessibility-based analyses. We further use PRINT and seq2PRINT to characterize murine hematopoietic stem cell (HSC) aging and show widespread reorganization of CREs and identify age-associate TF cooperations.
Second, we describe TDAC-seq, which is a technology that achieves single-molecule long-read profiling of chromatin accessibility and protein footprints using a double-strand DNA deaminase DddA11. We show that DddA11 can introduce C-to-T mutations in DNA regions that are accessible and unprotected from proteins, allowing accessibility and footprint measurements with nanopore sequencing. We show that TDAC-seq allows simultaneous read out of chromatin organization and genetic perturbations such as deletions introduced by CRISPR Cas9 or A-to-G mutations introduced by an adenine base editor (ABE). This allowed high throughput pooled CRISPR screen or ABE screen where the effect of each deletion/editing outcome on local chromatin organization can be individually assessed.
In summary, the body of work presented in this dissertation resolved several long-standing technological challenges in measuring chromatin-level biological processes in gene-regulation. The technologies described enable cell-type/state-resolved single base-pair tracking of regulatory factor binding, as well as single-molecule long-read measurement of chromatin organization, providing powerful new tools for gene regulation studies.