Using Electronic Medical Records to Study Lung Cancer Prognosis
CitationYuan, Qianyu. 2020. Using Electronic Medical Records to Study Lung Cancer Prognosis. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
AbstractLung cancer is the most commonly diagnosed malignancy and is a leading cause of cancer-related deaths worldwide. In the US, the current five-year survival is about 20.6%, which is significantly lower than most leading cancers, such as prostate cancer (99%), breast cancer (91%), and colon cancer (66%). Survival of lung cancer patients is heterogeneous, even within the stage group. The identification of stable and reliable prognostic variables and the development of prediction tools are needed to identify the subgroup with better or worse outcomes. Electronic medical records (EMRs) provide a low-cost means of accessing rich longitudinal data on large populations for research. It allows us to evaluate multiple risk factors including clinical, demographic, treatment, molecular, behavior information, and lung cancer progression simultaneously, enabling development of predictive models.
In chapter 1, we assembled a lung cancer cohort using EMRs from a large healthcare system (Partners HealthCare). Phenotyping algorithm was applied to identify lung cancer patients. Extraction strategies combining structured and unstructured data were used to collect demographics, clinical outcomes, prognostic factors, and treatment information for lung cancer patients. Data completeness was evaluated, and data accuracy was assessed by comparing with the Boston Lung Cancer Study (BLCS) database and chart review results.
In chapter 2, a prognostic model for 5-year overall survival (OS) was developed and validated for newly diagnosed non-small cell patients. We identified age, sex, smoking status, histological type, stage, BMI, albumin, ALP, creatinine, HGB, RDW, WBC, NLR, calcium and sodium as significant predictors of 5-year OS. Our model achieved higher discrimination compared with the model based on sex, age, stage, and histological type. A more accurate outcome prediction model, which can be applied upon the diagnosis of NSCLC, would be essential for informed decisions making regarding clinical care and practice.
Finally, in chapter 3, we aimed to identify advanced NSCLC patients who likely benefit from PD-1/L1 inhibitors. We proposed a prognostic score to stratify advanced NSCLC patients treated with PD-1/L1 inhibitors into poor, intermediate, and good groups for progression free survival.
Added up, we assembled a large lung cancer cohort, investigated how clinical factors influence the prognosis of non-small cell lung cancer, and develop integrative prediction algorithms for clinical outcomes.
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37365861
- FAS Theses and Dissertations