Publication:

On Causal Inference in Real World Settings

Loading...
Thumbnail Image

Date

2023-05-12

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Han, Larry. 2023. On Causal Inference in Real World Settings. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

In the present dissertation, we consider three classical and yet modern topics in causal inference -- surrogate markers, multi-source federated learning, and sensitivity analysis. In each case, present-day obstacles in real world settings make estimation and inference of causal estimands a challenging endeavor.

In Chapter 1, we tackle the problem of how to identify and validate surrogate markers using real-world data (RWD). There is a need to develop statistical methods to evaluate the proportion of treatment effect (PTE) explained by surrogates in RWD, which have become increasingly common. To address this knowledge gap, we propose inverse probability weighted (IPW) and doubly robust (DR) estimators of an optimal transformation of the surrogate and the corresponding PTE measure. We demonstrate that the proposed estimators are consistent and asymptotically normal, and the DR estimator is consistent when either the propensity score model or outcome regression model is correctly specified. In two RWD settings, we show that our method can identify and validate surrogate markers for inflammatory bowel disease (IBD).

Chapter 2 is focused on federated learning of causal effects in multi-source settings. We develop a Federated Adaptive Causal Estimation (FACE) framework to incorporate heterogeneous data from multiple sites to provide treatment effect estimation and inference for a flexibly specified target population of interest. To safely incorporate source sites and avoid negative transfer, we introduce an adaptive weighting procedure via a penalized regression, which achieves both consistency and optimal efficiency. Our strategy is communication-efficient and privacy-preserving, allowing participating sites to only share summary statistics once with other sites. We conduct both theoretical and numerical evaluations of FACE, and apply it to conduct a comparative effectiveness study of BNT162b2 (Pfizer) and mRNA-1273 (Moderna) vaccines on COVID-19 outcomes in U.S. veterans using electronic health records from five VA regional sites.

In Chapter 3, we develop a novel framework to conduct sensitivity analysis at the design stage of complex clinical trials. Sensitivity analyses are useful to assess the dependence of important design operating characteristics with respect to various unknown parameters. Two crucial components of sensitivity analyses are (i) the choice of a set of plausible simulation scenarios and (ii) the list of operating characteristics of interest. We propose a robust approach to choose the set of scenarios for inclusion in design sensitivity analyses. We maximize a utility criterion that formalizes whether a specific set of sensitivity scenarios is adequate to summarize how the operating characteristics of the trial design vary across all plausible values of the unknown parameters. Then, we use optimization techniques to select the best set of simulation scenarios (according to the criteria specified by the investigator) to exemplify the operating characteristics of the trial design. We illustrate our proposal in three trial designs of increasing complexity.

Description

Other Available Sources

Research Data

Keywords

Causal inference, Federated learning, Real world data, Sensitivity analysis, Surrogate markers, Transfer learning, Biostatistics, Statistics, Health sciences

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories