Publication:
A Method of Automated Nonparametric Content Analysis for Social Science

Thumbnail Image

Date

2010

Journal Title

Journal ISSN

Volume Title

Publisher

Wiley-Blackwell
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Hopkins, Daniel J. and Gary King. 2010. A method of automated nonparametric content analysis for social science. American Journal of Political Science 54(1): 229-247.

Abstract

The increasing availability of digitized text presents enormous opportunities for social scientists. Yet hand coding many blogs, speeches, government records, newspapers, or other sources of unstructured text is infeasible. Although computer scientists have methods for automated content analysis, most are optimized to classify individual documents, whereas social scientists instead want generalizations about the population of documents, such as the proportion in a given category. Unfortunately, even a method with a high percent of individual documents correctly classified can be hugely biased when estimating category proportions. By directly optimizing for this social science goal, we develop a method that gives approximately unbiased estimates of category proportions even when the optimal classifier performs poorly. We illustrate with diverse data sets, including the daily expressed opinions of thousands of people about the U.S. presidency. We also make available software that implements our methods and large corpora of text for further analysis.

Description

Other Available Sources

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories

Story
A Method of Automated Nonparametric Content… : DASH Story 2013-04-23
Starting a research project at the frontier between ecological and social sciences. The DASH project access to this article permits me to read research from another discipline that is not covered by the journal database package subscribed to in the laboratory where I am posted.