Show simple item record

dc.contributor.authorZhang, Xuegong
dc.contributor.authorLu, Xin
dc.contributor.authorShi, Qian
dc.contributor.authorXu, Xiu-qin
dc.contributor.authorLeung, Hon-chiu E
dc.contributor.authorHarris, Lyndsay N
dc.contributor.authorIglehart, James Dirk
dc.contributor.authorMiron, Alexander
dc.contributor.authorLiu, Jun
dc.contributor.authorWong, Wing H.
dc.date.accessioned2010-09-29T20:38:33Z
dc.date.issued2006
dc.identifier.citationZhang, Xuegong, Xin Lu, Qian Shi, Xiu-qin Xu, Hon-chiu E. Leung,, Lyndsay N. Harris, James D. Iglehart, Alexander Miron, Jun S. Liu, and Wing H. Wong. 2006. Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics 7:197.en_US
dc.identifier.issn1471-2105en_US
dc.identifier.urihttp://nrs.harvard.edu/urn-3:HUL.InstRepos:4454002
dc.description.abstractBackground: Like microarray-based investigations, high-throughput proteomics techniques require machine learning algorithms to identify biomarkers that are informative for biological classification problems. Feature selection and classification algorithms need to be robust to noise and outliers in the data. Results: We developed a recursive support vector machine (R-SVM) algorithm to select important genes/biomarkers for the classification of noisy data. We compared its performance to a similar, state-of-the-art method (SVM recursive feature elimination or SVM-RFE), paying special attention to the ability of recovering the true informative genes/biomarkers and the robustness to outliers in the data. Simulation experiments show that a 5 %-~20 % improvement over SVM-RFE can be achieved regard to these properties. The SVM-based methods are also compared with a conventional univariate method and their respective strengths and weaknesses are discussed. R-SVM was applied to two sets of SELDI-TOF-MS proteomics data, one from a human breast cancer study and the other from a study on rat liver cirrhosis. Important biomarkers found by the algorithm were validated by follow-up biological experiments. Conclusion: The proposed R-SVM method is suitable for analyzing noisy high-throughput proteomics and microarray data and it outperforms SVM-RFE in the robustness to noise and in the ability to recover informative features. The multivariate SVM-based method outperforms the univariate method in the classification performance, but univariate methods can reveal more of the differentially expressed features especially when there are correlations between the features.en_US
dc.description.sponsorshipStatisticsen_US
dc.language.isoen_USen_US
dc.publisherBioMed Centralen_US
dc.relation.isversionofdoi:10.1186/1471-2105-7-197en_US
dc.relation.hasversionhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC1456993/pdf/en_US
dash.licenseLAA
dc.titleRecursive SVM Feature Selection and Sample Classification for Mass-Spectrometry and Microarray Dataen_US
dc.typeJournal Articleen_US
dc.description.versionVersion of Recorden_US
dc.relation.journalBMC Bioinformaticsen_US
dash.depositing.authorLiu, Jun
dc.date.available2010-09-29T20:38:33Z
dc.identifier.doi10.1186/1471-2105-7-197*
dash.contributor.affiliatedIglehart, James
dash.contributor.affiliatedMiron, A
dash.contributor.affiliatedLiu, Jun


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record