Publication:
Significance analysis for clustering with single-cell RNA-sequencing data

No Thumbnail Available

Date

2023-07-10

Journal Title

Journal ISSN

Volume Title

Publisher

Springer Science and Business Media LLC
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Grabski, Isabella N., Kelly Street, Rafael Irizarry. "Significance analysis for clustering with single-cell RNA-sequencing data." Nat Methods 20, no. 8 (2023): 1196-1202. DOI: 10.1038/s41592-023-01933-9

Research Data

Abstract

Unsupervised clustering of single-cell RNA-sequencing data enables the identification and discovery of distinct cell populations. However, the most widely used clustering algorithms are heuristic and do not formally account for statistical uncertainty. Many popular pipelines use clustering stability methods to assess the algorithms’ output and decide on the number of clusters. However, we find that by not addressing known sources of variability in a statistically rigorous manner, these analyses lead to overconfidence in the discovery of novel cell-types. We extend a previous method for Gaussian data, Significance of Hierarchical Clustering (SHC), to propose a model-based hypothesis testing approach that incorporates significance analysis into the clustering algorithm and permits statistical evaluation of clusters as distinct cell populations. We also adapt this approach to permit statistical assessment on the clusters reported by any algorithm. We benchmarked our approach on real-world datasets against popular clustering workflows, demonstrating improved performance. To show its practical utility, we applied it to the Human Lung Cell Atlas and an atlas of the mouse cerebellar cortex. We identified several cases of over-clustering, leading to false discoveries, as well as under-clustering, resulting in the failure to identify new subpopulations that our method was able to detect.

Description

Other Available Sources

Keywords

Cell Biology, Molecular Biology, Biochemistry, Biotechnology

Terms of Use

Metadata Only

Endorsement

Review

Supplemented By

Referenced By

Related Stories