Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning

DSpace/Manakin Repository

Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning

Citable link to this page

 

 
Title: Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning
Author: Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi; Mao, Youdong

Note: Order does not necessarily reflect citation order of authors.

Citation: Wu, Jiayi, Yong-Bei Ma, Charles Congdon, Bevin Brett, Shuobing Chen, Yaofang Xu, Qi Ouyang, and Youdong Mao. 2017. “Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning.” PLoS ONE 12 (8): e0182130. doi:10.1371/journal.pone.0182130. http://dx.doi.org/10.1371/journal.pone.0182130.
Full Text & Related Files:
Abstract: Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.
Published Version: doi:10.1371/journal.pone.0182130
Other Sources: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5546606/pdf/
Terms of Use: This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA
Citable link to this page: http://nrs.harvard.edu/urn-3:HUL.InstRepos:34375025
Downloads of this work:

Show full Dublin Core record

This item appears in the following Collection(s)

 
 

Search DASH


Advanced Search
 
 

Submitters