Learning Representations of Multi-Condition Single-Cell Data
MetadataShow full item record
CitationReshef, Yakir. 2020. Learning Representations of Multi-Condition Single-Cell Data. Doctoral dissertation, Harvard Medical School.
AbstractThe recent ability to measure gene expression, protein levels, and more at single-cell resolution is thought to have great potential for elucidating disease mechanisms. This is in part because of the prospect of comparing such data across healthy and disease states, different experimental conditions, and different genetic backgrounds. Existing approaches for comparing single-cell data across multiple conditions primarily proceed by first combining cells across all conditions, clustering the cells into condition-independent ``cell type'' clusters, and then performing downstream analyses to look for differential abundance of clusters across conditions or differential expression in a cluster across conditions. However, single-cell data are not always well modeled by clusters, and attempting to learn clusters in a condition-independent way may obscure important signal that could distinguish between the conditions. Here we introduce differential correlation analysis (DCA), a method for comparing single-cell data across multiple conditions that circumvents these difficulties by directly searching for differences in gene-gene co-expression across samples. The resulting gene-gene co-expression differences can then be summarized using principal components analysis to yield differential principal components (dPCs). We show in simulation that this more flexible approach is able to recover both changes in population abundance and population-specific expression changes. DCA is also able to account for batch effect as well as other confounders both at the individual level (e.g., patient age) and at the cellular level (e.g., mitochondrial read content in one cell). We apply DCA to a dataset of single-cell data on approximately 500,000 memory T cells sampled from a sample of 128 early tuberculosis progression cases and 131 controls. Our method powerfully identifies case-control differences, and we present evidence that these difference may point to coherent cell populations and programs with relevance to tuberculosis immunology; investigating these populations further is an important avenue of future work. Our work suggests that current methods may not realize the full potential of multi-condition single-cell data for elucidating disease biology.
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364786