Publication:
Learning Representations of Multi-Condition Single-Cell Data

No Thumbnail Available

Date

2020-06-24

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Reshef, Yakir. 2020. Learning Representations of Multi-Condition Single-Cell Data. Doctoral dissertation, Harvard Medical School.

Research Data

Abstract

The recent ability to measure gene expression, protein levels, and more at single-cell resolution is thought to have great potential for elucidating disease mechanisms. This is in part because of the prospect of comparing such data across healthy and disease states, different experimental conditions, and different genetic backgrounds. Existing approaches for comparing single-cell data across multiple conditions primarily proceed by first combining cells across all conditions, clustering the cells into condition-independent ``cell type'' clusters, and then performing downstream analyses to look for differential abundance of clusters across conditions or differential expression in a cluster across conditions. However, single-cell data are not always well modeled by clusters, and attempting to learn clusters in a condition-independent way may obscure important signal that could distinguish between the conditions. Here we introduce differential correlation analysis (DCA), a method for comparing single-cell data across multiple conditions that circumvents these difficulties by directly searching for differences in gene-gene co-expression across samples. The resulting gene-gene co-expression differences can then be summarized using principal components analysis to yield differential principal components (dPCs). We show in simulation that this more flexible approach is able to recover both changes in population abundance and population-specific expression changes. DCA is also able to account for batch effect as well as other confounders both at the individual level (e.g., patient age) and at the cellular level (e.g., mitochondrial read content in one cell). We apply DCA to a dataset of single-cell data on approximately 500,000 memory T cells sampled from a sample of 128 early tuberculosis progression cases and 131 controls. Our method powerfully identifies case-control differences, and we present evidence that these difference may point to coherent cell populations and programs with relevance to tuberculosis immunology; investigating these populations further is an important avenue of future work. Our work suggests that current methods may not realize the full potential of multi-condition single-cell data for elucidating disease biology.

Description

Other Available Sources

Keywords

Single-cell, association study, scRNA-seq

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories