Publication: Unrepresentative big surveys significantly overestimated US vaccine uptake
No Thumbnail Available
Date
2021-12-08
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Springer Science and Business Media LLC
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Bradley, Valerie C., Shiro Kuriwaki, Michael Isakov, Dino Sejdinovic, Xiao-li Meng, Seth Flaxman. "Unrepresentative big surveys significantly overestimated US vaccine uptake." Nature 600, no. 7890 (2021): 695-700. DOI: 10.1038/s41586-021-04198-4
Research Data
Abstract
Surveys are a crucial tool for understanding public opinion and behavior, and their accuracy depends on maintaining statistically the representativeness of their target populations by minimizing biases from all sources. Unlike sampling variability, the impact of survey bias on error does not diminish as data size increases, but instead is magnified – an instance of the Big Data Paradox1. Here we demonstrate the Big Data Paradox in estimates of first-dose COVID-19 vaccine uptake in US adults from two large surveys: Delphi-Facebook2,3 (with about 250,000 responses per week) and Census Household Pulse4 (about 75,000 per week). Both significantly overestimate uptake compared to a benchmark from the Centers for Disease Control and Prevention (CDC)—by 17 and 14 percentage points respectively in May 2021. Yet their large sample sizes lead (incorrectly) to negligible error bars. In contrast, an online panel from Axios-Ipsos5 (with about 1,000 responses) following survey research best practices6 provides reliable estimates and error bars. We leverage a recent analytic framework1 for separating defects in data quality from other contributors to estimation error in order to assess factors driving error in 3 COVID surveys, and then conduct a scenario analysis for resulting implications on vaccine willingness and hesitancy. We show how a survey of 250,000 respondents can produce an estimate of population mean that is no more accurate than an estimate from a simple random sample of size 10. Our central message is that data quality matters far more than data quantity, and compensating the former with the latter is a mathematically provable losing proposition at least for assessing population averages, such as vaccination rates.
Description
Other Available Sources
Keywords
Multidisciplinary
Terms of Use
Metadata Only