Publication:
Text Mining Studies Applied to Sarcoma PubMed Database

No Thumbnail Available

Date

2020-03-03

Authors

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Gu, Yan. 2020. Text Mining Studies Applied to Sarcoma PubMed Database. Master's thesis, Harvard Extension School.

Research Data

Abstract

Text mining has been a powerful tool for information extraction, classification and prediction in multiple field. This thesis prototyped several text mining tools and their application in the sarcoma research field. Specifically, text data are fetched from PubMed around the topic of Sarcoma. Visualization tools were used to present the text data, key words and co-occurrence information. Text data regression and classification techniques were implemented to classify the abstract and title data. Both unsupervised and supervised machine learning models were evaluated and compared. Compared to common cancer types, sarcoma is a rare type of cancer that does not have much public information and searchable reports with regard to clinical and research data. Therefore, the utility of text mining was evaluated for retrieving useful information from the limited amount of available published work and data. The prototyping work described in this thesis illustrated results that can be obtained by applying text visualization and machine learning models in the sarcoma text data analysis. It would shed light on more in depth text mining research in the sarcoma field. Furthermore, the same application could potentially be applied in other fields, and used to analyze data stored in other publication database.

Description

Other Available Sources

Keywords

text mining, sarcoma, machine learning, classifications

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories