Publication:
Unrepresentative big surveys significantly overestimated US vaccine uptake

No Thumbnail Available

Date

2021-12-08

Journal Title

Journal ISSN

Volume Title

Publisher

Springer Science and Business Media LLC
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Bradley, Valerie C., Shiro Kuriwaki, Michael Isakov, Dino Sejdinovic, Xiao-li Meng, Seth Flaxman. "Unrepresentative big surveys significantly overestimated US vaccine uptake." Nature 600, no. 7890 (2021): 695-700. DOI: 10.1038/s41586-021-04198-4

Research Data

Abstract

Surveys are a crucial tool for understanding public opinion and behavior, and their accuracy depends on maintaining statistically the representativeness of their target populations by minimizing biases from all sources. Unlike sampling variability, the impact of survey bias on error does not diminish as data size increases, but instead is magnified – an instance of the Big Data Paradox1. Here we demonstrate the Big Data Paradox in estimates of first-dose COVID-19 vaccine uptake in US adults from two large surveys: Delphi-Facebook2,3 (with about 250,000 responses per week) and Census Household Pulse4 (about 75,000 per week). Both significantly overestimate uptake compared to a benchmark from the Centers for Disease Control and Prevention (CDC)—by 17 and 14 percentage points respectively in May 2021. Yet their large sample sizes lead (incorrectly) to negligible error bars. In contrast, an online panel from Axios-Ipsos5 (with about 1,000 responses) following survey research best practices6 provides reliable estimates and error bars. We leverage a recent analytic framework1 for separating defects in data quality from other contributors to estimation error in order to assess factors driving error in 3 COVID surveys, and then conduct a scenario analysis for resulting implications on vaccine willingness and hesitancy. We show how a survey of 250,000 respondents can produce an estimate of population mean that is no more accurate than an estimate from a simple random sample of size 10. Our central message is that data quality matters far more than data quantity, and compensating the former with the latter is a mathematically provable losing proposition at least for assessing population averages, such as vaccination rates.

Description

Other Available Sources

Keywords

Multidisciplinary

Terms of Use

Metadata Only

Endorsement

Review

Supplemented By

Referenced By

Related Stories