Publication:
Essays on the Applications of Machine Learning and Causal Inference in American Politics

No Thumbnail Available

Date

2019-05-13

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Kaufman, Aaron Russell. 2019. Essays on the Applications of Machine Learning and Causal Inference in American Politics. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Research Data

Abstract

Political science as a field prioritizes causal statements: the effects of governmental policies on societal outcomes, the effects of individual treatments on individual behaviors. Underlying these causal research questions, however, are important measurement problems. Many of the most theoretically important concepts in the field, concepts like representation, corruption, ideology, and democracy, elude straightforward definitions let alone measurement. In response, political scientists have developed sophisticated computational tools to produce cheap and objective measures, many of which leverage machine learning and big data. Today's political scientists measure Congressional ideology by scaling votes and bills and then interpreting the resultant scores. We measure rural development by examining satellite images of lights at night and political power or policy significance through newspaper mentions. We score the quality of nations' democracy through aggregated indices of features of their governments and national corruption through an aggregation of expert perceptions. But these computational measures on which political scientists rely, which we can calculate and include in our descriptive or causal analyses, are only proxies for the underlying quantities of interest. Like all proxies, these critical measurements are constructed with error, a fundamental discrepancy between the estimated value and the underlying concept. This dissertation focuses on set of computational tools to perform measurement in politicial science. In the introductory chapter, I outline best practices for this machine learning-based measurement approach, show through a meta-analysis that most related papers in political science fail to adhere to these standards, and then illustrate these best practices using examples from my own work and others'. These best practices consider the relative advantages of computers and humans in measurement, and leverage each in turn to produce automated, scalable, accurate, and valid measures of key variables in political science. The remaining chapters present detailed examples of this research model. In Chapter 2, I build a text-based model to estimate the partisan bias of survey questions, and show using a series of human evaluation studies that my method outperforms political scientists and public opinion researchers at estimating that bias. Chapter 3, coauthored with Gary King and Mayya Komisarchik, builds an ensemble model to measure and predict human evaluations of legislative district compactness, a key quantity in redistricting litigation; this model successfully predicts the behaviors of redistricting consultants, law professors, and Federal judges. Chapter 4, coauthored with Jon Rogowski, introduces a text model to measure the policy significance of Presidential unilateral actions, trained using a data set of newspaper mentions and validated using trained human coders, and finds that the scope and breadth of these unilateral actions is largely constrained by public opinion and the judiciary rather than the legislature.

Description

Other Available Sources

Keywords

machine learning, causal inference, measurement, computational social science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories