Publication:
Evaluating Polygenic Risk Scores for Risk-Stratified Screening: Development and Validation of Risk Prediction Models for Breast Cancer and Venous Thromboembolism

No Thumbnail Available

Date

2022-09-09

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Kalia, Sarah S. 2022. Evaluating Polygenic Risk Scores for Risk-Stratified Screening: Development and Validation of Risk Prediction Models for Breast Cancer and Venous Thromboembolism. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

Improving risk prediction is critically important for diseases such as breast cancer, for which current age-based screening guidelines lead to excess screening among older women and missed diagnoses among younger women, and venous thromboembolism (VTE), for which the initial presentation is often unpredictable and potentially fatal. Emerging evidence suggests that polygenic risk scores (PRS) can be incorporated into risk prediction for a variety of conditions to stratify risk and identify high-risk individuals in the general population. In this dissertation, I present three studies in which we developed and validated risk models for breast cancer and VTE by integrating PRS with classical risk factors from research surveys and electronic health records (EHR). In Chapter 1, we evaluated discrimination and five-year absolute risk calibration in the Mass General Brigham (MGB) Biobank of breast cancer risk models that included: (i) a PRS, (ii) clinical, reproductive and lifestyle risk factor data; and (iii) these risk factors combined in a joint model. We also assessed reclassification of predicted lifetime risks across the high-risk threshold above which breast magnetic resonance imaging (MRI) screening is recommended. Discrimination of the PRS model in our clinical population was comparable to that reported from research cohorts where the PRS was previously validated. Calibration analyses suggested five-year absolute risks were overestimated by all models, but these results appeared to be driven by overestimation in the highest decile. Our results provide estimates of the performance that may be expected for a rich prediction model applied in an EHR setting and illustrate the challenges of generating risk predictions that are dependent on EHR data. Our findings suggest that substantial work is needed to improve calibration of a joint model in the population(s) in which it would be applied. In Chapter 2, we extended a risk model incorporating pathogenic variants (PV) in six breast cancer predisposition genes and a PRS to include an epidemiologic risk score (ERS) capturing the effects of clinical, reproductive and lifestyle risk factors. This study was performed in a population-based sample from the Cancer Risk Estimates Related to Susceptibility (CARRIERS) Consortium. We assessed effect measure modification among the modeled factors, age, and family history of breast cancer. Our results illustrate that the ERS, alone and in combination with the PRS, can contribute to clinically meaningful risk stratification across high-risk thresholds for recommending risk-reducing medications and breast MRI screening, especially for carriers of a PV in a moderate penetrance gene such as ATM or CHEK2. Appropriately integrating monogenic, polygenic, and epidemiologic risk factors to improve breast cancer risk prediction models may inform personalized screening protocols and prevention efforts. In Chapter 3, we developed a VTE risk prediction model for the general population, incorporating known clinical and environmental risk factors and a PRS in a genotyped sample from three longitudinal Harvard cohorts: the Nurses’ Health Study I & II (NHS and NHS2) and the Health Professionals Follow-up Study (HPFS). We used inverse probability of sampling weights (IPW) to account for selection bias into the training sample and improve external validity. We validated our IPW model in an independent subset of the Harvard genotyped sample and externally in the MGB Biobank population. We compared discrimination and relative risk calibration based on our IPW model, a penalized model with the same covariates, and nested models that included design variables with and without the PRS. Models that included the PRS showed good discrimination in both internal and external validation analyses. The PRS-only model was well calibrated in the MGB Biobank; however, calibration was poor for models that included clinical/environmental risk factors. Our results suggest that risk factor misclassification and differences in risk factor distributions across study populations are largely responsible for poor calibration. Work remains to improve performance of our VTE risk prediction model in a clinical setting with a large amount of missing data. Together, the projects presented in this dissertation provide a proof of principle, and initial estimates of calibration in a clinical population, for risk models that integrate a PRS with classical risk factors. We hope this work contributes to a foundation for exploring a more personalized approach to breast cancer screening and VTE prophylaxis.

Description

Other Available Sources

Keywords

Epidemiology

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories