Publication:
Extraction of Metastatic Diagnosis and Treatment-Switching Rationale From Oncology EMR Notes Using Natural Language Processing

No Thumbnail Available

Date

2020-09-11

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Alkaitis, Matthew. 2020. Extraction of Metastatic Diagnosis and Treatment-Switching Rationale From Oncology EMR Notes Using Natural Language Processing. Doctoral dissertation, Harvard Medical School.

Research Data

Abstract

Purpose: Metastatic recurrence, disease progression and treatment-limiting toxicity are not routinely encoded into structured electronic medical record (EMR) data. In this study we assessed the ability of natural language processing algorithms to 1) identify patients with stage IV metastatic disease and 2) extract treatment switching rationale from unstructured clinician notes. Methods: We identified a retrospective cohort of breast cancer patients receiving care the majority of their care at Memorial Sloan Kettering Cancer Center (MSKCC) from 2014 – 2019. All patient records were de-identified with respect to names and dates. To predict metastatic status, we trained a "bag-of-words" term frequency-based algorithms to predict metastatic status for individual notes and aggregated these predictions into patient-level classifiers. Using these methods, we generated cohorts of early-stage, de novo and recurrent metastatic patients. We then used labels from MSKCC abstractors and unstructured clinical notes to train a model to predict clinician rationale associated with treatment switches. Results: A total of 17,315 breast cancer patients were included in the study. After removing patients with blank clinical medical notes, 539 patients were encoded as de novo M1 (stage IV) and 15,460 were encoded as presenting with M0 (Stage 0-III) disease. Our best-performing metastatic predictor for individual notes demonstrated a validation AUC of 0.9925 ± 0.0069 [SD], and the accuracy of our best-performing patient classifier was 0.867. Using our metastasis prediction algorithm, we identified 3,229 patients with predicted recurrent metastatic disease and 11,610 with early stage disease that did not recur. Among the metastatic cohort synthesized from de novo and recurrent patients, our best model predicted treatment switching rationale due to "toxicity" or "progression" with an AUC of 0.8227 ± 0.0302 [SD] in the held-out data set. The best-performing model in early-stage (non-recurrent) patients demonstrated a validation AUC of 0.8889 ± 0.0890 [SD]. De novo metastatic patients demonstrated greater rates of both progression- and toxicity-related treatment failure compared to recurrent metastatic patients. Conclusions: Our initial exploration suggests that simple frequency-based NLP models are capable of predicting metastatic recurrence and treatment switching rationale. Models accounting for semantic structure may be able to improve on these baselines and enable automated extraction of critical outcomes data at scale.

Description

Other Available Sources

Keywords

Natural Language Processing, Machine Learning, Oncology, Electronic Medical Records

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories