Publication: Learning Representations of Multi-Condition Single-Cell Data
No Thumbnail Available
Open/View Files
Date
2020-06-24
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Reshef, Yakir. 2020. Learning Representations of Multi-Condition Single-Cell Data. Doctoral dissertation, Harvard Medical School.
Research Data
Abstract
The recent ability to measure gene expression, protein levels, and more at single-cell resolution is thought to have great potential for elucidating disease mechanisms. This is in part because of the prospect of comparing such data across healthy and disease states, different experimental conditions, and different genetic backgrounds. Existing approaches for comparing single-cell data across multiple conditions primarily proceed by first combining cells across all conditions, clustering the cells into condition-independent ``cell type'' clusters, and then performing downstream analyses to look for differential abundance of clusters across conditions or differential expression in a cluster across conditions. However, single-cell data are not always well modeled by clusters, and attempting to learn clusters in a condition-independent way may obscure important signal that could distinguish between the conditions. Here we introduce differential correlation analysis (DCA), a method for comparing single-cell data across multiple conditions that circumvents these difficulties by directly searching for differences in gene-gene co-expression across samples. The resulting gene-gene co-expression differences can then be summarized using principal components analysis to yield differential principal components (dPCs). We show in simulation that this more flexible approach is able to recover both changes in population abundance and population-specific expression changes. DCA is also able to account for batch effect as well as other confounders both at the individual level (e.g., patient age) and at the cellular level (e.g., mitochondrial read content in one cell). We apply DCA to a dataset of single-cell data on approximately 500,000 memory T cells sampled from a sample of 128 early tuberculosis progression cases and 131 controls. Our method powerfully identifies case-control differences, and we present evidence that these difference may point to coherent cell populations and programs with relevance to tuberculosis immunology; investigating these populations further is an important avenue of future work. Our work suggests that current methods may not realize the full potential of multi-condition single-cell data for elucidating disease biology.
Description
Other Available Sources
Keywords
Single-cell, association study, scRNA-seq
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service