Publication:

Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning

Loading...
Thumbnail Image

Date

2017

Journal Title

Journal ISSN

Volume Title

Publisher

Public Library of Science
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Wu, Jiayi, Yong-Bei Ma, Charles Congdon, Bevin Brett, Shuobing Chen, Yaofang Xu, Qi Ouyang, and Youdong Mao. 2017. “Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning.” PLoS ONE 12 (8): e0182130. doi:10.1371/journal.pone.0182130. http://dx.doi.org/10.1371/journal.pone.0182130.

Abstract

Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.

Description

Research Data

Keywords

Biology and Life Sciences, Immunology, Immune System Proteins, Inflammasomes, Medicine and Health Sciences, Biochemistry, Proteins, Microscopy, Electron Microscopy, Electron Cryo-Microscopy, Computational Techniques, Split-Decomposition Method, Multiple Alignment Calculation, Physical Sciences, Mathematics, Applied Mathematics, Algorithms, Simulation and Modeling, Machine Learning Algorithms, Computer and Information Sciences, Artificial Intelligence, Machine Learning, Imaging Techniques, Optimization

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories