Publication: Text Mining Studies Applied to Sarcoma PubMed Database
No Thumbnail Available
Date
2020-03-03
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Gu, Yan. 2020. Text Mining Studies Applied to Sarcoma PubMed Database. Master's thesis, Harvard Extension School.
Research Data
Abstract
Text mining has been a powerful tool for information extraction, classification and prediction in multiple field. This thesis prototyped several text mining tools and their application in the sarcoma research field. Specifically, text data are fetched from PubMed around the topic of Sarcoma. Visualization tools were used to present the text data, key words and co-occurrence information. Text data regression and classification techniques were implemented to classify the abstract and title data. Both unsupervised and supervised machine learning models were evaluated and compared.
Compared to common cancer types, sarcoma is a rare type of cancer that does not have much public information and searchable reports with regard to clinical and research data. Therefore, the utility of text mining was evaluated for retrieving useful information from the limited amount of available published work and data.
The prototyping work described in this thesis illustrated results that can be obtained by applying text visualization and machine learning models in the sarcoma text data analysis. It would shed light on more in depth text mining research in the sarcoma field. Furthermore, the same application could potentially be applied in other fields, and used to analyze data stored in other publication database.
Description
Other Available Sources
Keywords
text mining, sarcoma, machine learning, classifications
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service