Publication:

Using Electronic Medical Records to Study Lung Cancer Prognosis

Loading...
Thumbnail Image

Date

2020-05-07

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Yuan, Qianyu. 2020. Using Electronic Medical Records to Study Lung Cancer Prognosis. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Abstract

Lung cancer is the most commonly diagnosed malignancy and is a leading cause of cancer-related deaths worldwide. In the US, the current five-year survival is about 20.6%, which is significantly lower than most leading cancers, such as prostate cancer (99%), breast cancer (91%), and colon cancer (66%). Survival of lung cancer patients is heterogeneous, even within the stage group. The identification of stable and reliable prognostic variables and the development of prediction tools are needed to identify the subgroup with better or worse outcomes. Electronic medical records (EMRs) provide a low-cost means of accessing rich longitudinal data on large populations for research. It allows us to evaluate multiple risk factors including clinical, demographic, treatment, molecular, behavior information, and lung cancer progression simultaneously, enabling development of predictive models. In chapter 1, we assembled a lung cancer cohort using EMRs from a large healthcare system (Partners HealthCare). Phenotyping algorithm was applied to identify lung cancer patients. Extraction strategies combining structured and unstructured data were used to collect demographics, clinical outcomes, prognostic factors, and treatment information for lung cancer patients. Data completeness was evaluated, and data accuracy was assessed by comparing with the Boston Lung Cancer Study (BLCS) database and chart review results. In chapter 2, a prognostic model for 5-year overall survival (OS) was developed and validated for newly diagnosed non-small cell patients. We identified age, sex, smoking status, histological type, stage, BMI, albumin, ALP, creatinine, HGB, RDW, WBC, NLR, calcium and sodium as significant predictors of 5-year OS. Our model achieved higher discrimination compared with the model based on sex, age, stage, and histological type. A more accurate outcome prediction model, which can be applied upon the diagnosis of NSCLC, would be essential for informed decisions making regarding clinical care and practice. Finally, in chapter 3, we aimed to identify advanced NSCLC patients who likely benefit from PD-1/L1 inhibitors. We proposed a prognostic score to stratify advanced NSCLC patients treated with PD-1/L1 inhibitors into poor, intermediate, and good groups for progression free survival. Added up, we assembled a large lung cancer cohort, investigated how clinical factors influence the prognosis of non-small cell lung cancer, and develop integrative prediction algorithms for clinical outcomes.

Description

Other Available Sources

Research Data

Keywords

Lung cancer, Electronic medical records, Prognosis, Prediction model

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories