Publication:

Statistical Methods for Improving Real-Time Outbreak Detection

Loading...
Thumbnail Image

Date

2025-05-16

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Gopaluni, Anuraag. 2025. Statistical Methods for Improving Real-Time Outbreak Detection. Doctoral Dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Real-time outbreak detection in resource-constrained settings requires robust statistical methods that can account for aberrations in historical data, reporting delays in the most recent observations, and dynamic changes in transmission trends. This dissertation develops and evaluates outbreak detection frameworks that address these practical challenges using simulation-based evaluation, real-world data applications, and flexible statistical modeling.

Chapter 1 investigates the impact of historical anomalies—termed ``aberrations''—on the performance of rolling outbreak detection methods applied to health management information system (HMIS) data. Motivated by five years of acute respiratory infection (ARI) surveillance data from Liberia, we simulate outbreaks under seven distinct data-generating mechanisms varying in trend and seasonality. We assess five detection algorithms: EARS, Farrington, Holt-Winters, and two Weinberger-Fulcher (WF) models (negative binomial and quasipoisson). Detection accuracy is measured through sensitivity, specificity, and pseudo-ROC curves, under varied aberration timing and outbreak size. We find that the presence of recent aberrations in the baseline degrades performance across models, with context-specific tradeoffs: EARS and WF models perform well in the absence of recent anomalies; WF QP and Holt-Winters maintain better balance between sensitivity and specificity when recent aberrations are present; and Farrington achieves high sensitivity but lower specificity in these settings. These results offer practical guidance for selecting rolling detection models under imperfect baseline conditions common in low- and middle-income countries (LMICs).

Chapter 2 develops a novel frequentist framework for real-time nowcasting of all-cause mortality under reporting delays, using Massachusetts death registration data from 2017 to 2022. Reporting delays are modeled via a discrete-time survival model that incorporates covariates such as day of the week, lag, and snapshot date to flexibly capture evolving delay patterns. Using method-of-moments estimation, we correct underreported death counts, propagate delay uncertainty into variance estimates, and apply LOESS smoothing to stabilize predictions for the most recent days. Variance from both the delay model and smoothing step is incorporated into predictive intervals. Compared to leading Bayesian and spline-based nowcasting methods, including hierarchical Bayesian models, NobBS, EpiNowcast, and GAM approaches, our method achieves superior empirical coverage, lower bias, and narrower interval widths, particularly during the early pandemic phase when reporting delays exhibited sharp day-of-week effects. Explicit modeling of day-of-week reporting behavior substantially improved accuracy relative to approaches that omitted temporal covariates, and the method remained robust to shifts in the reporting distribution across time.

Chapter 3 extends this delay correction framework by integrating nowcasting with slope-based outbreak detection in a unified two-stage approach. Using molecular-confirmed COVID-19 case data from Puerto Rico, we estimate unreported cases via a discrete-time hazard model and then fit a slope-based detection model using generalized estimating equations (GEE), incorporating nowcast-derived variances as observation-level weights. We conduct a simulation study varying epidemic wave intensity, reporting delay speed, and baseline structure---including both stable and declining post-wave baselines---to evaluate time to detection, false positive rate, and calibration across models. We benchmark against the Farrington algorithm, $R_t$-based detection, and the Weinberger-Fulcher model. Our slope-based GEE approach consistently achieves faster and more reliable detection, particularly under low and medium wave scenarios with reporting delays, while maintaining strong calibration across a range of nominal alpha levels. The method also performs well when applied to real-world Puerto Rico data, issuing timely signals across three distinct epidemic waves.

Together, these chapters provide a comprehensive statistical toolkit for outbreak detection under the operational constraints of incomplete, delayed, and aberration-prone surveillance data. The approaches developed are computationally efficient, modular, and applicable across diverse epidemiological contexts, with particular relevance for LMICs and subnational surveillance systems.

Description

Other Available Sources

Research Data

Keywords

COVID-19, Low- and Middle-Income Countries (LMICs), Nowcasting, Outbreak Detection, Real-Time Monitoring, Syndromic Surveillance, Biostatistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories