Publication: Quantifying the Health Impacts of Air Pollution: Methods for Causal Exposure–Response Estimation and Policy-Relevant Evidence
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Ambient fine particulate matter (PM2.5) remains one of the most consequential environmental risks to human health worldwide, contributing to millions of premature deaths each year. Yet determining how mortality risk changes across the full PM2.5 exposure range—particularly at low concentrations where regulatory decisions are most sensitive—remains a fundamental scientific and methodological challenge. Observational air pollution data are high-dimensional and subject to complex confounding structures, spatial heterogeneity, and model misspecification. Recovering credible causal exposure--response functions (ERFs) in this setting requires methods capable of flexibly addressing heterogeneous effects, nonlinear relationships, and uncertainty in both exposure and outcome models. This dissertation develops, evaluates, and applies such methods to strengthen the evidence base for air-quality regulation and policy-relevant health-impact assessment in the United States.
The overarching theme of this dissertation is connecting rigorous statistical methods with meaningful public health and policy insight. First, I evaluate the performance of widely used ERF estimators and synthesize guidance for when different approaches are most appropriate. Second, I introduce a new causal inference method designed to address a pervasive but under-recognized source of bias—local confounding—that arises when the strength and type of confounding vary across the exposure distribution. Third, I demonstrate how these methodological advances can be applied to a real policy context through a transparent, reproducible health-impact assessment of proposed energy infrastructure. Across all three aims, I emphasize design-based workflows, principled diagnostics, and reproducibility to support robust inference in settings where regulatory stakes are high.
Chapter 1 addresses a longstanding gap in understanding which statistical methods reliably estimate ERFs under realistic confounding and exposure–outcome structures. I compare seven commonly used ERF estimators across a comprehensive set of simulation scenarios that vary the true ERF shape, the confounding mechanism, the degree of effect heterogeneity, and sample size. These include traditional regression models (linear, spline-based, and threshold) and design-based causal estimators that use entropy balancing or generalized propensity score matching. Two key insights emerge. First, regression-based ERFs can exhibit substantial bias when confounding is nonlinear or heterogeneous, even when the resulting curves appear smooth and precise. Second, design-based causal estimators that explicitly balance covariates across the exposure distribution tend to be more robust, particularly with large sample sizes. Applying all methods to a national cohort of more than 68 million Medicare beneficiaries reveals a distinctly nonlinear ERF: mortality risks rise steeply at lower PM2.5 concentrations and attenuate at higher levels. This chapter concludes with concrete methodological recommendations and fully reproducible code to support adoption.
Chapter 2 fills a critical methodological gap by providing a causal inference framework specifically tailored to settings where confounding varies across the exposure distribution. I introduce REBEL (Rolling Entropy Balancing for Exposure--response functions under Local confounding), a new design-and-analysis pipeline for continuous exposures. REBEL (i) constructs overlapping exposure windows, (ii) achieves covariate balance within each window through entropy balancing, (iii) calibrates each window to the full target population to recover population-level effects, and (iv) aggregates local estimates via an overlap-aware meta-estimator. I also develop diagnostics for detecting local confounding and propose a counterfactual cross-validation approach for tuning algorithmic parameters. In simulations, REBEL consistently outperforms existing ERF estimators when local confounding is present and remains competitive when it is not. Applied to 68.5 million Medicare beneficiaries, REBEL uncovers a steep, supralinear increase in all-cause mortality at low exposures to coal-derived PM2.5 —a pattern masked by global models—suggesting that traditional approaches may materially understate coal’s health burden. This chapter also derives a coal-specific exposure–response function that explicitly adjusts for potential local confounding, providing one of the first flexible, population-based ERFs for coal-derived PM2.5 in the literature and demonstrating how source-specific toxicity can be estimated with improved causal validity.
Chapter 3 demonstrates how these methodological tools can be translated into actionable evidence through a rigorous, policy-relevant assessment of PM2.5 impacts from a proposed 2,200-MW natural-gas combined-cycle plant in Colleton County, South Carolina. The analysis integrates source-specific emissions estimation, reduced-complexity atmospheric dispersion modeling (InMAP), population-weighted exposure assessment, environmental justice profiling, and health and economic valuation using U.S. EPA tools. Under conservative assumptions, the plant would expose more than 2.09 million people across South Carolina and Georgia to measurable increases in annual PM2.5, with the highest burdens concentrated in nearby census tracts characterized by lower incomes, lower property values, and higher proportions of Black residents. Estimated health damages reach up to $27.9 million annually, rising further under higher-capacity operating scenarios. Sensitivity analyses reveal that operational decisions (e.g., capacity factor) have substantially greater influence on community exposure than modest variations in design parameters such as stack height. This chapter provides a transparent, reproducible template for early-stage health-impact assessment of energy facilities.
Taken together, these chapters (i) provide principled guidance for choosing among ERF estimators and diagnosing when causal designs are needed, (ii) introduce a new method that addresses locally varying confounding in continuous-exposure settings, and (iii) demonstrate how causal evidence can be scaled to evaluate community-level risks from proposed energy infrastructure. By bridging methodological rigor with policy relevance, this dissertation advances the tools needed to credibly quantify the health impacts of air pollution and to inform air-quality and energy decisions.