Publication: Significance analysis for clustering with single-cell RNA-sequencing data
No Thumbnail Available
Date
2023-07-10
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Springer Science and Business Media LLC
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Grabski, Isabella N., Kelly Street, Rafael Irizarry. "Significance analysis for clustering with single-cell RNA-sequencing data." Nat Methods 20, no. 8 (2023): 1196-1202. DOI: 10.1038/s41592-023-01933-9
Research Data
Abstract
Unsupervised clustering of single-cell RNA-sequencing data enables the identification and discovery of distinct cell populations. However, the most widely used clustering algorithms are heuristic and do not formally account for statistical uncertainty. Many popular pipelines use clustering stability methods to assess the algorithms’ output and decide on the number of clusters. However, we find that by not addressing known sources of variability in a statistically rigorous manner, these analyses lead to overconfidence in the discovery of novel cell-types. We extend a previous method for Gaussian data, Significance of Hierarchical Clustering (SHC), to propose a model-based hypothesis testing approach that incorporates significance analysis into the clustering algorithm and permits statistical evaluation of clusters as distinct cell populations. We also adapt this approach to permit statistical assessment on the clusters reported by any algorithm. We benchmarked our approach on real-world datasets against popular clustering workflows, demonstrating improved performance. To show its practical utility, we applied it to the Human Lung Cell Atlas and an atlas of the mouse cerebellar cortex. We identified several cases of over-clustering, leading to false discoveries, as well as under-clustering, resulting in the failure to identify new subpopulations that our method was able to detect.
Description
Other Available Sources
Keywords
Cell Biology, Molecular Biology, Biochemistry, Biotechnology
Terms of Use
Metadata Only