Publication:

Robust Methods for Causal Inference and Missing Data in Electronic Health Record-Based Comparative Effectiveness Research

Loading...
Thumbnail Image

Date

2022-05-16

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Levis, Alexander. 2022. Robust Methods for Causal Inference and Missing Data in Electronic Health Record-Based Comparative Effectiveness Research. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Missing data arise in most applied statistical settings, and dedicated methods are required to conduct valid statistical inference in such cases. This dissertation focuses on the development and validation of robust statistical methods, and accompanying study designs, for handling missing data in a general context. Special focus is given to problems arising from comparative effectiveness research using electronic health record data, where confounding and missingness must be acknowledged and dealt with simultaneously.

First, in Chapter 1, we consider causal average treatment effect (ATE) estimation from observational cohort data when baseline confounders are partially missing at random. Based on a novel identification assumption and ensuing likelihood factorization, we propose an influence function-based estimator that is valid for arbitrarily many partially observed confounders, multiply robust, and attains nominal convergence rates when using flexible models for nuisance functions appearing in the influence function.

Second, in a general missing data context, we consider augmenting an initially observed sample with follow-up on a subsample in which complete data are obtained. In Chapter 2, we first consider estimation of the ATE from observational data with initially missing outcomes, derive a nonparametric efficient estimator that is valid even when the usual missing at random assumption is violated, and a semiparametric efficient estimator that has lower variance but is only valid when the outcomes were initially missing at random. We then generalize the nonparametric estimation results to the case where the data are initially subject to arbitrary coarsening, and develop nonparametric efficient estimators of any smooth full data functional of interest. In Chapter 3, we extend these general results in two directions in an effort to improve efficiency. For an arbitrary smooth full data functional, we derive optimal second-phase subsample selection probabilities that minimize the asymptotic variance of the nonparametric efficient estimator developed in Chapter 2, under budget constraints. Moreover, focusing on two-phase sampling of baseline covariates in a randomized trial, we derive a semiparametric efficient estimator of the ATE that leverages restrictions on the observed data distribution guaranteed by treatment randomization.

Description

Other Available Sources

Research Data

Keywords

Biostatistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories