Geographic Clustering for Neighborhood Boundaries: A Spatial Analysis of Chicago Using Public Data
Author
Kuppersmith, Joshua Benjamin
Metadata
Show full item recordCitation
Kuppersmith, Joshua Benjamin. 2019. Geographic Clustering for Neighborhood Boundaries: A Spatial Analysis of Chicago Using Public Data. Bachelor's thesis, Harvard College.Abstract
Open data initiatives in cities around the world have enabled new efforts to understand and improve urban areas through data analysis. In order to develop actionable insights to improve cities, it is important to isolate differences between geographic areas throughout the city. Neighborhoods are typically used as a unit for spatial separation, where each neighbor- hood is internally similar, and different from outside areas. As such, neighborhood analysis is key to developing an understanding of complex urban dynamics, yet current neighborhood boundaries do not always adequately reflect similar areas of cities. This thesis proposes a new clustering algorithm to automatically generate neighborhoods with highly similar in- ternal data profiles. Using a grid-model of a city, this new method of clustering, called Geographic K-Means, incorporates data accumulated within grid cells and builds clumps of neighboring cells with similar data trends. This method is optimized using hyper-parameter tuning to improve an Earth Mover’s Distance-based measure of within-neighborhood homo- geneity. The optimization uses regularization to enforce smooth neighborhood boundaries, helping us find an optimal balance between data similarity and realistic contiguous neigh- borhoods. In order to build and test this algorithm, we used Chicago as a case study due to its abundance of data. By generating new Chicago neighborhood boundaries, and increasing within-neighborhood crime homogeneity, we are able to see the relationship between crime and neighborhoods, and better detect sharp boundaries between areas of the city.Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAACitable link to this page
https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364628
Collections
- FAS Theses and Dissertations [6097]
Contact administrator regarding this item (to report mistakes or request changes)