Publication:
Logistic Regression in Rare Events Data

Thumbnail Image

Date

2001

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

Oxford University Press
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

King, Gary, and Langche Zeng. 2001. Logistic regression in rare events data. Political Analysis 9(2): 137-163.

Research Data

Abstract

We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables.We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.

Description

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories

Story
Logistic Regression in Rare Events Data… : DASH Story 2013-04-08
I solve large logistic regression models for a large insurance company. We rely on publications from the open access library to keep up to date with current methodology as well as to share best practices with others in our company. Its often easier to point a colleague to a Open Access paper, rather than try to convince them personally to a new method.
Story
Logistic Regression in Rare Events Data… : DASH Story 2013-10-26
Next week I'm teaching binary dependent variables in an applied graduate econometrics course and I wanted to include a segment on rare events logit (King and Zeng). Easy access to this article is very helpful to my students and me. Thank you.
Story
Logistic Regression in Rare Events Data… : DASH Story 2014-05-26
As part of my masters degree at Stockholm University I'm currently writing a short analysis of the scientific methods used for a doctoral thesis I've read. I realized that the literature I have was insufficient in describing the methods used, and a google search led me to this article. I've found it very helpful.
Story
Logistic Regression in Rare Events Data… : DASH Story 2015-04-28
I'm a research assistant conducting an independent project looking at an outcome variable heavily weighted with 0's, so I'm reading everything I can find on the subject. Being able to access and read this paper means that when I go meet with the PI supervising my project, I'll be better able to contribute meaningfully to the discussion of what statistical methods we're going to use and to understand the relevant considerations. As someone about to start the grad school application process, with recommendation letters very much on my mind, being able to meaningfully contribute to that discussion is a huge deal. In addition, it makes me feel much more capable in attempting this project. Thank you so much for making this article available to me.