Publication: Characterizing Regulatory Elements and Non-Coding Variants in the Human Genome
No Thumbnail Available
Open/View Files
Date
2023-05-02
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Siraj, Layla. 2023. Characterizing Regulatory Elements and Non-Coding Variants in the Human Genome. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
Regulatory elements and the non-coding variants within them govern the spatiotemporal expression of genes as part of coordinated sets of networks, mediated by the combinatorial binding of transcription factors. The mechanisms by which non-coding variants affect the regulatory ability of the elements in which they reside, as well as the structural organization of regulatory elements, remain poorly understood. In this dissertation, I approach the characterization of regulatory elements and non-coding variants through three distinct and complementary perspectives.
In chapter 2, I employ a functional approach. I present and analyze a rich resource of functional effect data for over 300,000 fine-mapped complex trait variants and robust controls. I demonstrate that massively parallel reporter assays (MPRAs) provide important and salient functional effect information for elements residing in endogenous regulatory elements. I present mechanistic evidence for epistasis between non-coding variants and dissect cases of multiple causal variants across independent signals and within the same signal. I also characterize the individual nucleotide contribution across the entire regulatory element for 164 loci and uncover new sequence motifs contributing to regulatory element activity.
In chapter 3, I employ a positional and biochemical approach. In characterizing regulatory elements by the transcription factor binding sites that lie within, I ultimately uncover serious confounding effects of cut coverage and residual enzymatic bias that hamper the ability to infer TF binding using ATAC-seq data. I also present a framework for ascertaining residual bias in footprinting algorithms.
Finally, in chapter 4, I employ a statistical approach. I use the natural language processing model of Latent Dirichlet Allocation in order to identify the biological programs common to subsets of non-coding variants and phenotypes. Using data from the United Kingdom Biobank, I generated 15 clusters and employed cell-type specific enrichment of nearby genes to biologically annotate. I present our preliminary findings, with 4 biologically meaningful clusters, and discuss improvements and challenges ahead in comprehensively characterizing biological programs.
Description
Other Available Sources
Keywords
Biological programs, Complex disease, Gene regulation, Genomics, Regulatory networks, Transcription factor binding, Genetics, Biophysics, Bioinformatics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service