Publication:

Computational and Bayesian Methods for Geographic Data in the Social Sciences

Loading...
Thumbnail Image

Date

2023-05-12

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

McCartan, Cory. 2023. Computational and Bayesian Methods for Geographic Data in the Social Sciences. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Research in the social sciences often involves the analysis of geographic data. These data may be explicitly spatial, with each observation tied to a specific point on or region of the globe, or they may include variables which are heavily influenced by geographic patterns of human activity and settlement. This dissertation advances computational and Bayesian modeling methods for analyzing geographic data in three important social scientific settings.

In Part I, we study legislative redistricting, where voters are geographically assigned to legislative districts. Analyzing redistricting plans is significantly complicated by the confounding role that geography plays: two plans in different states cannot be compared directly, because the political and social geography of each state may differ. We develop a sequential Monte Carlo sampling algorithm that allows researchers to generate counterfactual redistricting plans from a specific probability distribution. We then apply this tool as part of a larger probabilistic framework of individual and differential harm, which we advance to measure the impact of redistricting plans on different groups of voters.

In Part II, we develop a Bayesian model and accompanying online survey tool to record and study individuals' perceptions of their neighborhoods. While neighborhoods are recognized as important mediators and determinants of many social scientific outcomes, their inherent subjectivity, and the difficulties of modeling spatial regions, have posed a challenge for their quantitative analysis. The proposed Bayesian model is both flexible and computationally efficient. In a sample of 2,527 voters in three metropolitan areas, we find that white voters, and party-identifying voters, are more likely to include areas in their neighborhood which have a higher share of co-racial and co-partisan residents, respectively.

In Part III, we tackle the problem of estimating racial disparities from individual-level data which contains no racial information. In this setting, researchers will often probabilistically impute race using individual surnames and addresses, a method known as Bayesian Improved Surname Geocoding (BISG). However, these probabilistic predictions are alone insufficient for unbiased estimation of racial disparities. We provide a highly plausible identifying assumption, as well as a class of Bayesian models, which we call Bayesian Instrumental Regression for Disparity Estimation (BIRDiE), that solve this problem. The models admit a highly efficient expectation-maximization algorithm for inference, and greatly reduce estimation error in a validation example from the North Carolina voter file.

Description

Other Available Sources

Research Data

Keywords

Bayesian Statistics, Computational Methods, Geographic Data, Social Science, Statistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories