Publication: Robust Causal Inference Methods for Electronic Health Record-Based Studies with Missing Eligibility and Calendar Time-Varying Treatment Effects
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Electronic health records (EHR) are seen as useful alternatives to randomized controlled trials when the latter are infeasible due to financial, ethical, or logistical constraints. Unfortunately, EHR exist to record clinical activity and assist with billing, and thus information is not collected with research in mind. When using EHR to study comparative effectiveness, there are many factors that a researcher can not control: treatments are not randomly assigned, information on certain patient covariates may be unavailable, when to begin follow-up is not always clear, and which patients receive treatment and why may change over time. As such, rigorous statistical methods which contend with these factors, often simultaneously, are necessary when conducting EHR-based studies.
In Chapter 1, we consider the problem of selection bias due to missingness in covariates which define study eligibility in target trial emulations. We illustrate the dangers of naively excluding patients missing certain eligibility-defining covariates and propose a solution based on a novel missing at random assumption using inverse probability weighting. Our solution integrates seamlessly within a larger framework for dealing with common sources of bias in sequential target trial emulations, such as confounding, non-adherence, and censoring.
Next, in Chapter 2, we extend the ideas of Chapter 1 and propose a robust and efficient estimator of the causal average treatment effect on the treated, defined in the study eligible population, in cohort studies where eligibility-defining covariates are missing at random. The approach facilitates the use of flexible machine-learning strategies for component nuisance functions while maintaining appropriate convergence rates for valid asymptotic inference, and displays robustness to various degrees of model misspecification in the component nuisance functions.
Finally, in Chapter 3, we formalize sequential target trial emulations for continuous outcomes and propose a statistical framework to describe both how and why causal effects vary over treatment initiation time in EHR-based studies. Our approach projects doubly robust, time-specific treatment effect estimates onto candidate marginal structural models and uses a principled model selection procedure to best describe how effects vary by treatment initiation time. We further introduce a novel summary metric, based on standardization analysis, to quantify the role of covariate shift in explaining observed effect changes and disentangle changes in treatment effects from changes in the patient population receiving treatment.
The statistical methods developed in this dissertation are motivated by real EHR-based studies of bariatric surgery at Kaiser Permanente. Throughout, we use these data to both illustrate and validate the methods introduced in this work.