Recursive SVM Feature Selection and Sample Classification for Mass-Spectrometry and Microarray Data

DSpace/Manakin Repository

Recursive SVM Feature Selection and Sample Classification for Mass-Spectrometry and Microarray Data

Citable link to this page

. . . . . .

Title: Recursive SVM Feature Selection and Sample Classification for Mass-Spectrometry and Microarray Data
Author: Zhang, Xuegong; Shi, Qian; Xu, Xiu-qin; Leung, Hon-chiu E; Harris, Lyndsay N; Lu, Xin; Iglehart, James Dirk; Miron, Alexander; Liu, Jun; Wong, Wing H.

Note: Order does not necessarily reflect citation order of authors.

Citation: Zhang, Xuegong, Xin Lu, Qian Shi, Xiu-qin Xu, Hon-chiu E. Leung,, Lyndsay N. Harris, James D. Iglehart, Alexander Miron, Jun S. Liu, and Wing H. Wong. 2006. Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics 7:197.
Full Text & Related Files:
Abstract: Background: Like microarray-based investigations, high-throughput proteomics techniques require machine learning algorithms to identify biomarkers that are informative for biological classification problems. Feature selection and classification algorithms need to be robust to noise and outliers in the data. Results: We developed a recursive support vector machine (R-SVM) algorithm to select important genes/biomarkers for the classification of noisy data. We compared its performance to a similar, state-of-the-art method (SVM recursive feature elimination or SVM-RFE), paying special attention to the ability of recovering the true informative genes/biomarkers and the robustness to outliers in the data. Simulation experiments show that a 5 %-~20 % improvement over SVM-RFE can be achieved regard to these properties. The SVM-based methods are also compared with a conventional univariate method and their respective strengths and weaknesses are discussed. R-SVM was applied to two sets of SELDI-TOF-MS proteomics data, one from a human breast cancer study and the other from a study on rat liver cirrhosis. Important biomarkers found by the algorithm were validated by follow-up biological experiments. Conclusion: The proposed R-SVM method is suitable for analyzing noisy high-throughput proteomics and microarray data and it outperforms SVM-RFE in the robustness to noise and in the ability to recover informative features. The multivariate SVM-based method outperforms the univariate method in the classification performance, but univariate methods can reveal more of the differentially expressed features especially when there are correlations between the features.
Published Version: doi:10.1186/1471-2105-7-197
Other Sources: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1456993/pdf/
Terms of Use: This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA
Citable link to this page: http://nrs.harvard.edu/urn-3:HUL.InstRepos:4454002

Show full Dublin Core record

This item appears in the following Collection(s)

  • FAS Scholarly Articles [7103]
    Peer reviewed scholarly articles from the Faculty of Arts and Sciences of Harvard University
 
 

Search DASH


Advanced Search
 
 

Submitters