Publication: Computational and Statistical Methods for Characterizing Single-Cell Heterogeneity
No Thumbnail Available
Date
2018-01-19
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Research Data
Abstract
The emergence of single-cell technologies has highlighted the potential to discover novel cellular subpopulations and states within heterogeneous populations. Such technologies are achieving increasingly higher-throughput, generating high-dimensional measurements for hundreds of thousands of cells. Furthermore, single-cell measurements are often variable and sparse, owing to both the stochasticity of biological processes and inconsequential technical noise. Therefore, new computational and statistical methods are needed to analyze these large and noisy single-cell datasets and extract meaningful biological insights.
Here, we describe multiple computational and statistical methods for characterizing genetic, transcriptomic, and epigenetic heterogeneity as well as their interplay at the single-cell level. First, focusing on transcriptional heterogeneity, we develop a method called pathway and gene set over-dispersion analysis (PAGODA) that leverages pathway-level information to identify and characterize transcriptional subpopulations in a way that is robust and takes into consideration technical artefacts from single-cell RNA-sequencing data. We apply this approach to generate pure in-silico mini-bulks, which we use to identify of cell-type specific alternative splicing that govern cell fate in the developing brain. Likewise, we apply bulk and single-cell transcriptomics to characterize the effects of SF3B1 mutation on aberrant alternative splicing and gene expression in chronic lymphocytic leukemia (CLL). Then, beyond transcriptional heterogeneity, we look at patterns of intratumoral genetic and methylation heterogeneity in CLL to provide insights on its highly variable disease course. Finally, inspired by the high degree of genetic, transcriptomic, and epigenetic heterogeneity we have observed, we sought to characterize their interplay by developing an RT-qPCR-based assay to enable simultaneous targeted assessment of DNA and RNA-level information at the single-cell level, as well as a hidden Markov model-integrated Bayesian hierarchical approach (HoneyBADGER) to infer megabase-level copy number alternations directly from single-cell RNA-sequencing data. We also develop computational techniques to integrate single-nuclear transcriptomics and single-cell DNA-accessibility data to derive insights about cell-type specific regulation in the adult human brain.
While we have focused on applying these methods to study the brain and CLL, we hope that the scientific community will be able to tailor and apply these methods and general approaches to diverse cellular systems.
Description
Other Available Sources
Keywords
Biology, Bioinformatics, Biology, Genetics, Biology, Biostatistics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service