Advancing Data Science Methods for Environmental Health Policy Design and Evaluation

Loading...
Thumbnail Image

Date

2025-05-09

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

To date, the field of environmental health has been primarily focused on characterizing the public health impacts of environmental exposures such as air pollution and temperature. However, beyond studying impacts, there is an increasingly recognized need for designing data-driven policies or strategies to (a) reduce both the overall health burden and health disparities associated with current environmental factors and (b) adapt to emergent threats such as climate change. Meanwhile, diverse and growing data sources, paired with modern data science methods, hold the potential for expanding the types of environmental health science and policy questions that can be answered.

This dissertation employs a wide range of data sources and novel analytic techniques to break ground on the frontier of data-driven environmental health policy design, which often involves evaluating existing policies along the way. Meanwhile, an overarching theme of this work is the bridging of different scientific domains and methodological areas, such as decision science with environmental justice, artificial intelligence with climate & health, and causal inference with remote sensing of both human activity and environmental factors.

Chapter 1 proposes a Monte Carlo-based methodology to compare realistic strategies for measuring and reporting daily air quality information. Specifically, we investigate the usefulness of low-cost air quality sensors, which are often lauded for their potential to fill spatiotemporal gaps in air quality information – especially for underserved populations, making them a central tool in environmental justice efforts. However, because many of these sensors are purchased and deployed by concerned citizens with financial means, the extra measurements are skewed towards more privileged areas. Also, low-cost sensors have lower accuracy compared to reference-grade monitors. To characterize these tradeoffs, we design and implement a simulation study based closely on real data to evaluate the accuracy and equity of information from individuals’ nearest air quality instrument (sensor or reference monitor) under both real and hypothetical low-cost sensor deployment scenarios. By varying the number of sensors deployed, the amount of sensor measurement error (noise), and the relative placement of the sensors (e.g. at schools, near major roads, and in communities with environmental justice concerns), we are able to analyze and make recommendations for how a local or regional government or organization might deploy the most effective and equitable network of low-cost air quality sensors, given their budget constraints. Further, the demonstrated simulation methodology can be adapted for other environmental monitoring objectives.

Chapter 2 develops a framework with which reinforcement learning (RL) can be used to optimize the issuance of heat alerts. Heat alerts are a practical and low-cost intervention to mitigate the public health impacts of extreme heat. However, current practice for issuing heat alerts does not take advantage of modern data science methods to optimize local alert criteria. To fill this gap, we harness RL (a branch of artificial intelligence) to build a model for whether to issue a heat alert on a given day, accounting for sequential dependence – which in the heat alert setting is due to both alert fatigue and finite resources/ability (of individuals/communities) to take health-protective measures. To use RL, we have to overcome several major incompatibilities between standard RL methods and the heat alert setting, which extend to environmental health / climate & health more generally. First, the relatively small and easily confounded signal in heat-alert-health relationships challenges the ability of RL algorithms to identify relevant effects, much less to optimize heat alert issuance. Second, mainstream RL methods are not suitable for settings with significant spatial heterogeneity, which is a known feature of heat alert health impacts. To address these challenges, we use a combination of statistical modeling, cutting-edge RL techniques, and conceptually simple yet effective modifications such as restricting alerts to extremely hot days, to learn heat alert issuance policies that reduce the adverse health impacts of extreme heat compared to the current U.S. National Weather Service policy. We also prioritize interpretable characterization of the RL results, offering intuitive insights about which counties across the nation stand to benefit the most from implementing heat alert-RL. A major contribution of this project is establishing a connection between the environmental health and artificial intelligence communities.

Chapter 3 combines spatiotemporal causal inference methods and remotely sensed data to quantify the air quality impacts of national-level plastic waste policies, via the mechanism of trash burning at open dump sites. Especially in low- and middle-income countries (LMICs) without adequate solid waste management infrastructure, trash burning is a huge environmental public health problem that is difficult to quantify on large scales due to its distributed, intermittent, and not-infrequently covert nature. This burden is compounded by many high-income countries exporting massive amounts of plastic waste, both with and without LMICs’ consent. Using Indonesia as a case study, we develop a strategy to quantify the local air quality impacts of China’s 2018 waste import ban (and subsequent diversion of waste to other Southeast Asian countries), using a collection of remotely sensed data products to overcome the lack of ground-level monitoring and to strengthen the causal argument. Methodologically, we combine two strains of causal inference, ultimately considering the proximity to ports (from which international plastic waste enters the country) as an induced continuous exposure at locations where open dumping has been detected, and using past years of data as controls, conditional on meteorologic variation. Additionally, we extend past work’s uncertainty quantification strategy to account for residual spatial dependence. This project not only reveals a statistically significant increase in Indonesian air pollution attributable to imported plastic waste, but also lays groundwork for future environmental policy evaluations in data-scarce settings.

Description

Keywords

Air pollution, Climate change, Decision making, Environmental justice, Policy evaluation, Spatiotemporal, Biostatistics, Environmental health

Citation

Considine, Ellen. 2025. Advancing Data Science Methods for Environmental Health Policy Design and Evaluation. Doctoral Dissertation, Harvard University Graduate School of Arts and Sciences.

Endorsement

Review

Supplemented By

Referenced By