Publication: Computational and Bayesian Methods for Geographic Data in the Social Sciences
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Research in the social sciences often involves the analysis of geographic data. These data may be explicitly spatial, with each observation tied to a specific point on or region of the globe, or they may include variables which are heavily influenced by geographic patterns of human activity and settlement. This dissertation advances computational and Bayesian modeling methods for analyzing geographic data in three important social scientific settings.
In Part I, we study legislative redistricting, where voters are geographically assigned to legislative districts. Analyzing redistricting plans is significantly complicated by the confounding role that geography plays: two plans in different states cannot be compared directly, because the political and social geography of each state may differ. We develop a sequential Monte Carlo sampling algorithm that allows researchers to generate counterfactual redistricting plans from a specific probability distribution. We then apply this tool as part of a larger probabilistic framework of individual and differential harm, which we advance to measure the impact of redistricting plans on different groups of voters.
In Part II, we develop a Bayesian model and accompanying online survey tool to record and study individuals' perceptions of their neighborhoods. While neighborhoods are recognized as important mediators and determinants of many social scientific outcomes, their inherent subjectivity, and the difficulties of modeling spatial regions, have posed a challenge for their quantitative analysis. The proposed Bayesian model is both flexible and computationally efficient. In a sample of 2,527 voters in three metropolitan areas, we find that white voters, and party-identifying voters, are more likely to include areas in their neighborhood which have a higher share of co-racial and co-partisan residents, respectively.
In Part III, we tackle the problem of estimating racial disparities from individual-level data which contains no racial information. In this setting, researchers will often probabilistically impute race using individual surnames and addresses, a method known as Bayesian Improved Surname Geocoding (BISG). However, these probabilistic predictions are alone insufficient for unbiased estimation of racial disparities. We provide a highly plausible identifying assumption, as well as a class of Bayesian models, which we call Bayesian Instrumental Regression for Disparity Estimation (BIRDiE), that solve this problem. The models admit a highly efficient expectation-maximization algorithm for inference, and greatly reduce estimation error in a validation example from the North Carolina voter file.