Publication: Spatio-Temporal Methods for Causal Inference in Quasi-Experimental Studies: Applications to Environmental Health
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Modern environmental health studies often rely on quasi-experimental designs, where policies or external shocks induce localized changes in exposure across geographic regions. When comprehensive panel data are available before and after an intervention, one can, in principle, leverage the observed data to reconstruct counterfactual trends. Yet rare outcomes (e.g., low counts of a disease) and unmeasured confounding that evolves dynamically across regions and periods can undermine standard approaches like difference-in-differences or synthetic control methods. This dissertation develops a unified suite of Bayesian spatio-temporal methods, spatio-temporal matrix completion and Gaussian process models, that explicitly borrow strength across units and time points to stabilize inference and quantify uncertainty. Through extensive simulations and real-data applications, we demonstrate how these methods generalize popular causal tools, yield interpretable weighting schemes, and provide practical guidance on model implementation for environmental health research.
In Chapter 1, we introduce Bayesian spatio-temporal matrix completion models tailored for rare count outcomes in quasi-experimental panel data through an application examining the impacts of traffic-related air pollution (TRAP) on childhood hematologic cancers. Although some pollutants emitted in vehicle exhaust, such as benzene, are known to cause leukemia in adults with high exposure levels, less is known about the relationship between TRAP and childhood hematologic cancer. In the 1990s, the US EPA enacted the reformulated gasoline program in select areas of the US, which drastically reduced ambient TRAP in affected areas. This created an ideal quasi-experiment to study the effects of TRAP on childhood hematologic cancers. However, existing methods for quasi-experimental analyses can perform poorly when outcomes are rare and unstable, as with childhood cancer incidence. We develop Bayesian spatio-temporal matrix completion methods to conduct causal inference in quasi-experimental settings with rare outcomes. Selective information sharing across space and time enables stable estimation, and the Bayesian approach facilitates uncertainty quantification. We evaluate the methods through simulations and apply them to estimate the causal effects of TRAP on childhood leukemia and lymphoma.
In Chapter 2, we expand on Chapter 1 to investigate the potential heterogeneous impacts of the reformulated gasoline program on the incidence of CYA lymphoma across disease type and demographic strata. We employ recently-proposed Bayesian causal Gaussian process (GP) models, applied to population cancer registry data, to estimate effects of the program on CYA lymphoma incidence across strata defined by cancer type, sex, race, Hispanic ethnicity, and age group. Our analytic framework allows for stable estimation of stratum-specific effects via data-driven information sharing across space, time, and strata. Effects are reported on both the absolute and relative scales. We find evidence that the largest program-attributable reductions in lymphoma incidence rates occurred for Hodgkin lymphoma, and among individuals who are male, white, and/or aged 20-29. The finding of larger reductions in Hodgkin lymphoma is notable since prior TRAP studies have primarily focused on non-Hodgkin lymphoma.
In Chapter 3, we delve deeper into Gaussian process approaches for quasi-experiments, addressing diverse confounding structures that may not be fully accommodated by the model presented in Chapter 2. Estimating causal effects in quasi-experiments with spatio-temporal panel data often requires adjusting for unmeasured confounding that varies across space and time. Gaussian processes offer a flexible, nonparametric modeling approach that can account for such complex dependencies through carefully chosen covariance kernels. In this paper, we provide a practical and interpretable framework for applying GPs to causal inference in panel data settings. We demonstrate how GPs generalize popular methods such as synthetic control and vertical regression, and we show that the GP posterior mean can be represented as a weighted average of observed outcomes, where the weights reflect spatial and temporal similarity. To support applied use, we explore how different kernel choices impact both estimation performance and interpretability, offering guidance for selecting between separable and nonseparable kernels. Through simulations and application to Hurricane Katrina mortality data, we illustrate how GP models can be used to estimate counterfactual outcomes and quantify treatment effects. All code and materials are made publicly available to support reproducibility and encourage adoption. Our results suggest that GPs are a promising and interpretable tool for addressing unmeasured spatio-temporal confounding in quasi-experimental studies.