Publication: Methods for Analyzing Sparse Genetic and Epigenetic Data: Single Cells to Population Levels
No Thumbnail Available
Open/View Files
Date
2019-05-16
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Kangeyan, Divy S. 2019. Methods for Analyzing Sparse Genetic and Epigenetic Data: Single Cells to Population Levels. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
Research Data
Abstract
This dissertation work is motivated by the large influx of sequencing data: that is, both in terms of the amount and the type of data, where current statistical and computational methods are inadequate in addressing the data manipulation and hence the corresponding scientific questions of interest.
In Chapter 1, we address a current issue regarding a data analysis platform to conduct large amount of Next Generation Sequencing based methylation data. Bisulfite sequencing allows base-pair resolution DNA methylation and has recently been adapted for use in single cells. We present a set of preprocessing pipelines that allow users to ensure 1) reproducibility, 2) scalability, 3) integration with publicly available data, and 4) access to best-practice methods. The workflows produce output for visualization and further downstream analysis. Optional use of cloud computing resources facilitates analysis of large datasets, and integration with existing methylation data.
In Chapter 2, we focus our attention on sparsity in single-cell DNA methylation data. Single-cell DNA methylation analysis has the potential to produce high resolution methylation landscape and elucidate the heterogeneity in methylation. But it suffers from low coverage due to the low quantity of input DNA. We find that on average, only about 5 – 10\% of CpGs are observed in typical single-cell libraries. We show how missingness of methylation status can bias metrics such as mean methylation estimates and clustering analyses. We propose a joint analysis approach that leverages bulk sequencing data, to infer bias-corrected single-cell methylation status.
In Chapter 3, we consider sparsity in the rare variant data and how it can be utilized to infer population structure. Population-substructure in genetic studies is often assessed by principal component analysis of genetic relatedness matrices (GRM). With the general availability of whole-genome sequencing (WGS) platforms, rare variant data are now widely available. As such data are genetically “younger” than common variants, they should enable for a fine-scale assessment of the substructure. Here, using the 1,000 genomes project data, we compare the features of Jaccard-based GRMs with standard approaches that utilizes the genetic covariance matrix, with respect to their ability to examine and infer fine-scale population substructure.
Description
Other Available Sources
Keywords
Sparsity, DNA methylation: Single-cell analysis, Population structure, Rare variant data
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service