Show simple item record

dc.contributor.authorGu, Yan
dc.date.accessioned2020-09-11T11:51:46Z
dc.date.created2020-03
dc.date.issued2020-03-03
dc.date.submitted2020
dc.identifier.citationGu, Yan. 2020. Text Mining Studies Applied to Sarcoma PubMed Database. Master's thesis, Harvard Extension School.
dc.identifier.urihttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364887*
dc.description.abstractText mining has been a powerful tool for information extraction, classification and prediction in multiple field. This thesis prototyped several text mining tools and their application in the sarcoma research field. Specifically, text data are fetched from PubMed around the topic of Sarcoma. Visualization tools were used to present the text data, key words and co-occurrence information. Text data regression and classification techniques were implemented to classify the abstract and title data. Both unsupervised and supervised machine learning models were evaluated and compared. Compared to common cancer types, sarcoma is a rare type of cancer that does not have much public information and searchable reports with regard to clinical and research data. Therefore, the utility of text mining was evaluated for retrieving useful information from the limited amount of available published work and data. The prototyping work described in this thesis illustrated results that can be obtained by applying text visualization and machine learning models in the sarcoma text data analysis. It would shed light on more in depth text mining research in the sarcoma field. Furthermore, the same application could potentially be applied in other fields, and used to analyze data stored in other publication database.
dc.description.sponsorshipSoftware Engineering
dc.format.mimetypeapplication/pdf
dash.licenseLAA
dc.subjecttext mining, sarcoma, machine learning, classifications
dc.titleText Mining Studies Applied to Sarcoma PubMed Database
dc.typeThesis or Dissertation
dash.depositing.authorGu, Yan
dc.date.available2020-09-11T11:51:46Z
thesis.degree.date2020
thesis.degree.grantorHarvard Extension School
thesis.degree.grantorHarvard Extension School
thesis.degree.levelMasters
thesis.degree.levelMasters
thesis.degree.nameALM
thesis.degree.nameALM
dc.contributor.committeeMemberFarutin, Victor
dc.contributor.committeeMemberJaume, Sylvain
dc.contributor.committeeMemberWang, Hongming
dc.type.materialtext
thesis.degree.departmentSoftware Engineering
thesis.degree.departmentSoftware Engineering
dash.identifier.vireo
dc.identifier.orcid0000-0003-0386-3330
dash.author.emailygu2013@gmail.com


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record