Text Mining Studies Applied to Sarcoma PubMed Database
MetadataShow full item record
CitationGu, Yan. 2020. Text Mining Studies Applied to Sarcoma PubMed Database. Master's thesis, Harvard Extension School.
AbstractText mining has been a powerful tool for information extraction, classification and prediction in multiple field. This thesis prototyped several text mining tools and their application in the sarcoma research field. Specifically, text data are fetched from PubMed around the topic of Sarcoma. Visualization tools were used to present the text data, key words and co-occurrence information. Text data regression and classification techniques were implemented to classify the abstract and title data. Both unsupervised and supervised machine learning models were evaluated and compared.
Compared to common cancer types, sarcoma is a rare type of cancer that does not have much public information and searchable reports with regard to clinical and research data. Therefore, the utility of text mining was evaluated for retrieving useful information from the limited amount of available published work and data.
The prototyping work described in this thesis illustrated results that can be obtained by applying text visualization and machine learning models in the sarcoma text data analysis. It would shed light on more in depth text mining research in the sarcoma field. Furthermore, the same application could potentially be applied in other fields, and used to analyze data stored in other publication database.
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364887