A Method of Automated Nonparametric Content Analysis for Social Science

DSpace/Manakin Repository

A Method of Automated Nonparametric Content Analysis for Social Science

Citable link to this page


Title: A Method of Automated Nonparametric Content Analysis for Social Science
Author: Hopkins, Daniel J.; King, Gary ORCID  0000-0002-5327-7631

Note: Order does not necessarily reflect citation order of authors.

Citation: Hopkins, Daniel J. and Gary King. 2010. A method of automated nonparametric content analysis for social science. American Journal of Political Science 54(1): 229-247.
Full Text & Related Files:
Research Data: http://hdl.handle.net/1902.1/12898
Abstract: The increasing availability of digitized text presents enormous opportunities for social scientists. Yet hand coding many blogs, speeches, government records, newspapers, or other sources of unstructured text is infeasible. Although computer scientists have methods for automated content analysis, most are optimized to classify individual documents, whereas social scientists instead want generalizations about the population of documents, such as the proportion in a given category. Unfortunately, even a method with a high percent of individual documents correctly classified can be hugely biased when estimating category proportions. By directly optimizing for this social science goal, we develop a method that gives approximately unbiased estimates of category proportions even when the optimal classifier performs poorly. We illustrate with diverse data sets, including the daily expressed opinions of thousands of people about the U.S. presidency. We also make available software that implements our methods and large corpora of text for further analysis.
Published Version: doi:10.1111/j.1540-5907.2009.00428.x
Other Sources: http://j.mp/1M2zFGN
Terms of Use: This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA
Citable link to this page: http://nrs.harvard.edu/urn-3:HUL.InstRepos:5125261
Downloads of this work:

Show full Dublin Core record

This item appears in the following Collection(s)


Search DASH

Advanced Search