Show simple item record

dc.contributor.advisorAryee, Martin
dc.contributor.authorKangeyan, Divy S.
dc.date.accessioned2019-12-12T09:19:07Z
dc.date.created2019-05
dc.date.issued2019-05-16
dc.date.submitted2019
dc.identifier.citationKangeyan, Divy S. 2019. Methods for Analyzing Sparse Genetic and Epigenetic Data: Single Cells to Population Levels. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
dc.identifier.urihttp://nrs.harvard.edu/urn-3:HUL.InstRepos:42029774*
dc.description.abstractThis dissertation work is motivated by the large influx of sequencing data: that is, both in terms of the amount and the type of data, where current statistical and computational methods are inadequate in addressing the data manipulation and hence the corresponding scientific questions of interest. In Chapter 1, we address a current issue regarding a data analysis platform to conduct large amount of Next Generation Sequencing based methylation data. Bisulfite sequencing allows base-pair resolution DNA methylation and has recently been adapted for use in single cells. We present a set of preprocessing pipelines that allow users to ensure 1) reproducibility, 2) scalability, 3) integration with publicly available data, and 4) access to best-practice methods. The workflows produce output for visualization and further downstream analysis. Optional use of cloud computing resources facilitates analysis of large datasets, and integration with existing methylation data. In Chapter 2, we focus our attention on sparsity in single-cell DNA methylation data. Single-cell DNA methylation analysis has the potential to produce high resolution methylation landscape and elucidate the heterogeneity in methylation. But it suffers from low coverage due to the low quantity of input DNA. We find that on average, only about 5 – 10\% of CpGs are observed in typical single-cell libraries. We show how missingness of methylation status can bias metrics such as mean methylation estimates and clustering analyses. We propose a joint analysis approach that leverages bulk sequencing data, to infer bias-corrected single-cell methylation status. In Chapter 3, we consider sparsity in the rare variant data and how it can be utilized to infer population structure. Population-substructure in genetic studies is often assessed by principal component analysis of genetic relatedness matrices (GRM). With the general availability of whole-genome sequencing (WGS) platforms, rare variant data are now widely available. As such data are genetically “younger” than common variants, they should enable for a fine-scale assessment of the substructure. Here, using the 1,000 genomes project data, we compare the features of Jaccard-based GRMs with standard approaches that utilizes the genetic covariance matrix, with respect to their ability to examine and infer fine-scale population substructure.
dc.description.sponsorshipBiostatistics
dc.format.mimetypeapplication/pdf
dc.language.isoen
dash.licenseLAA
dc.subjectSparsity
dc.subjectDNA methylation: Single-cell analysis
dc.subjectPopulation structure
dc.subjectRare variant data
dc.titleMethods for Analyzing Sparse Genetic and Epigenetic Data: Single Cells to Population Levels
dc.typeThesis or Dissertation
dash.depositing.authorKangeyan, Divy S.
dc.date.available2019-12-12T09:19:07Z
thesis.degree.date2019
thesis.degree.grantorGraduate School of Arts & Sciences
thesis.degree.grantorGraduate School of Arts & Sciences
thesis.degree.levelDoctoral
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
thesis.degree.nameDoctor of Philosophy
dc.contributor.committeeMemberYuan, Guocheng
dc.contributor.committeeMemberMiller, Jeffrey
dc.contributor.committeeMemberLange, Christoph
dc.type.materialtext
thesis.degree.departmentBiostatistics
thesis.degree.departmentBiostatistics
dash.identifier.vireo
dash.author.emailsdivyagash@yahoo.com


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record