Publication: Statistical and computational methods for spatial and regulatory genomics
No Thumbnail Available
Open/View Files
Date
2023-06-01
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Zou, Luli S. 2023. Statistical and computational methods for spatial and regulatory genomics. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
The complexity and scale of high-throughput genomics experiments has grown significantly over the past few years. Recently developed sequencing technologies now enable the profiling of transcriptomes at different 2D spatial locations across tissues; the reconstruction of protein- and RNA-associated DNA folding patterns from 3D genome data; and the measurement of both the transcriptome and the epigenome from the same cell. While some biological questions of interest can be answered from these data using standard computational tools, others require methodological innovation for dealing with issues of high dimensionality and sparsity that come with the increased resolution of these new technologies. In this dissertation, I present three novel statistical and computational methods motivated in addressing fundamental biological questions for spatial and regulatory genomic data.
Chapter 1 presents a method for detecting allele-specific expression in 2D spatial transcriptomics. Spatial transcriptomics data is highly sparse and challenging to analyze given that each observation can potentially contain mixtures of cell types. Our method uses a generalized linear model framework to account for cell type mixtures and detect spatial allele-specific expression within cell type. We demonstrate the utility of the method through simulations as well as Slide-seq data generated from the mouse hippocampus. The findings facilitated by our method provide new insight into the uncharacterized landscape of spatial and cell type-specific allele-specific expression in the mouse hippocampus.
Chapter 2 introduces a method for deconvolving chromatin binding signal in 3D genome data, such as HiChIP or RD-SPRITE. In these data, interest lies in identifying the precise binding or interaction locations of proteins or RNAs with DNA, as well as estimating the strength of these associations. We use a probabilistic model where the observed DNA-DNA contacts directly convolve the true underlying interaction signal. We show in RD-SPRITE data that our method accurately deconvolves 1D lncRNA signal to more specific locations consistent with prior biological knowledge. We further show in HiChIP data that our method increases power for downstream analysis such as differential loop detection.
Chapter 3 presents a method for estimating gene regulatory networks from paired single-cell RNA-sequencing and single-cell ATAC-sequencing. Our method uses a variational auto-encoder framework to jointly infer latent cell states present in the data, such as those resulting from perturbations, and estimate the strength of connections between transcription factors and regulatory elements, and regulatory elements and target genes. We demonstrate through simulations that our method can accurately estimate the latent accessibilities of regulatory elements, as well as accurately recover the network weights. Furthermore, our method successfully identifies latent cell states in single-cell multimodal data with ground truth transcription factor overexpression, and nominates network connections between transcription factors and downstream target genes that are consistent with known biology.
Description
Other Available Sources
Keywords
epigenomics, genomics, single cell, transcriptomics, Biostatistics, Genetics, Biology
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service