Publication:

Robust Causal Inference Methods for Electronic Health Record-Based Studies with Missing Eligibility and Calendar Time-Varying Treatment Effects

Loading...
Thumbnail Image

Date

2026-01-05

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Benz, Luke. 2026. Robust Causal Inference Methods for Electronic Health Record-Based Studies with Missing Eligibility and Calendar Time-Varying Treatment Effects. Doctoral Dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Electronic health records (EHR) are seen as useful alternatives to randomized controlled trials when the latter are infeasible due to financial, ethical, or logistical constraints. Unfortunately, EHR exist to record clinical activity and assist with billing, and thus information is not collected with research in mind. When using EHR to study comparative effectiveness, there are many factors that a researcher can not control: treatments are not randomly assigned, information on certain patient covariates may be unavailable, when to begin follow-up is not always clear, and which patients receive treatment and why may change over time. As such, rigorous statistical methods which contend with these factors, often simultaneously, are necessary when conducting EHR-based studies.

In Chapter 1, we consider the problem of selection bias due to missingness in covariates which define study eligibility in target trial emulations. We illustrate the dangers of naively excluding patients missing certain eligibility-defining covariates and propose a solution based on a novel missing at random assumption using inverse probability weighting. Our solution integrates seamlessly within a larger framework for dealing with common sources of bias in sequential target trial emulations, such as confounding, non-adherence, and censoring.

Next, in Chapter 2, we extend the ideas of Chapter 1 and propose a robust and efficient estimator of the causal average treatment effect on the treated, defined in the study eligible population, in cohort studies where eligibility-defining covariates are missing at random. The approach facilitates the use of flexible machine-learning strategies for component nuisance functions while maintaining appropriate convergence rates for valid asymptotic inference, and displays robustness to various degrees of model misspecification in the component nuisance functions.

Finally, in Chapter 3, we formalize sequential target trial emulations for continuous outcomes and propose a statistical framework to describe both how and why causal effects vary over treatment initiation time in EHR-based studies. Our approach projects doubly robust, time-specific treatment effect estimates onto candidate marginal structural models and uses a principled model selection procedure to best describe how effects vary by treatment initiation time. We further introduce a novel summary metric, based on standardization analysis, to quantify the role of covariate shift in explaining observed effect changes and disentangle changes in treatment effects from changes in the patient population receiving treatment.

The statistical methods developed in this dissertation are motivated by real EHR-based studies of bariatric surgery at Kaiser Permanente. Throughout, we use these data to both illustrate and validate the methods introduced in this work.

Description

Other Available Sources

Research Data

Keywords

Bariatric Surgery, Causal Inference, Electronic Health Records, Influence Functions, Machine Learning, Missing Data, Biostatistics, Statistics, Epidemiology

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories