Publication:

Leveraging genetic data in observational studies: methods in Mendelian randomization and applications in risk prediction modeling

Loading...
Thumbnail Image

Date

2021-03-05

Authors

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Shi, Joy. 2020. Leveraging genetic data in observational studies: methods in Mendelian randomization and applications in risk prediction modeling. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Given the rapid advancements in DNA sequencing technologies over the past decade and a half, the amount of genetic data available has grown exponentially. Large-scale genome studies of complex human traits and diseases have become increasingly prevalent, and discoveries from such studies are increasingly being applied to studies of individual or population health to accomplish two major tasks of data science: causal inference and prediction. My dissertation focuses on two specific applications of genetic data---Mendelian randomization and risk prediction modeling for endometrial cancer.

Mendelian randomization (MR) is a popular application of instrumental variable (IV) estimation which relies on genetic variation as an instrument to estimate the effect of an exposure on an outcome. MR studies generally rely on IV methods that were developed for time-fixed exposures. That is, they typically can only handle a single measurement of the exposure. However, many MR studies are concerned with the effects of time-varying exposures, such as alcohol intake or blood lipids. This makes estimates from such studies difficult to interpret and potentially biased.

In this dissertation, I describe the instrumental conditions required for IV estimation with a time-varying exposure. I discuss three possible causal interpretations of MR estimates---the point effect, the period effect, and the lifetime effect---and the assumptions required to imbue these estimates with these interpretations. Next, I discuss methods to incorporate time-varying exposures in MR analyses: g-estimation of structural mean models for linear outcomes, and g-estimation of structural nested cumulative failure time models for time-to-event outcomes. These extensions of IV methods overcome the limitations of classical IV methods for time-varying exposures but have rarely been implemented in practice. I demonstrate applications of these models using data from the Framingham Heart Study (FHS) and the Nurses' Health Study (NHS).

Last, I consider the role of genetic data in prediction by developing two risk prediction models for endometrial cancer: a clinical-only model, which included factors collected via questionnaire (e.g. body mass index, hormone therapy use), and a clinical plus genetic model, which additionally included germline genetic polymorphisms. The models were developed using pooled data from the Epidemiology of Endometrial Cancer Consortium, and were externally validated in three large cohorts. This analysis demonstrates the potential utility of such models in identifying high risk women for risk-reducing interventions.

Description

Other Available Sources

Research Data

Keywords

Causal inference, Genetic epidemiology, Instrumental variables, Mendelian randomization, Risk prediction, Epidemiology

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories