Publication: PSSP-RFE: Accurate Prediction of Protein Structural Class by Recursive Feature Extraction from PSI-BLAST Profile, Physical-Chemical Property and Functional Annotations
Open/View Files
Date
2014
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Public Library of Science
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Li, Liqi, Xiang Cui, Sanjiu Yu, Yuan Zhang, Zhong Luo, Hua Yang, Yue Zhou, and Xiaoqi Zheng. 2014. “PSSP-RFE: Accurate Prediction of Protein Structural Class by Recursive Feature Extraction from PSI-BLAST Profile, Physical-Chemical Property and Functional Annotations.” PLoS ONE 9 (3): e92863. doi:10.1371/journal.pone.0092863. http://dx.doi.org/10.1371/journal.pone.0092863.
Research Data
Abstract
Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.
Description
Other Available Sources
Keywords
Biology and Life Sciences, Biochemistry, Proteins, Theoretical Biology, Computer and Information Sciences, Computing Methods, Mathematical Computing, Physical Sciences, Mathematics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service