Re-Identification of Home Addresses from Spatial Locations Anonymized by Gaussian Skew

DSpace/Manakin Repository

Re-Identification of Home Addresses from Spatial Locations Anonymized by Gaussian Skew

Citable link to this page

 

 
Title: Re-Identification of Home Addresses from Spatial Locations Anonymized by Gaussian Skew
Author: Cassa, Christopher Anthony; Wieland, Shannon C.; Mandl, Kenneth David

Note: Order does not necessarily reflect citation order of authors.

Citation: Cassa, Christopher A., Shannon C. Wieland, and Kenneth D. Mandl. 2008. Re-identification of home addresses from spatial locations anonymized by Gaussian skew. International Journal of Health Geographics 7:45.
Full Text & Related Files:
Abstract: Background: Knowledge of the geographical locations of individuals is fundamental to the practice of spatial epidemiology. One approach to preserving the privacy of individual-level addresses in a data set is to de-identify the data using a non-deterministic blurring algorithm that shifts the geocoded values. We investigate a vulnerability in this approach which enables an adversary to reidentify individuals using multiple anonymized versions of the original data set. If several such versions are available, each can be used to incrementally refine estimates of the original geocoded location. Results: We produce multiple anonymized data sets using a single set of addresses and then progressively average the anonymized results related to each address, characterizing the steep decline in distance from the re-identified point to the original location, (and the reduction in privacy). With ten anonymized copies of an original data set, we find a substantial decrease in average distance from 0.7 km to 0.2 km between the estimated, re-identified address and the original address. With fifty anonymized copies of an original data set, we find a decrease in average distance from 0.7 km to 0.1 km. Conclusion: We demonstrate that multiple versions of the same data, each anonymized by nondeterministic Gaussian skew, can be used to ascertain original geographic locations. We explore solutions to this problem that include infrastructure to support the safe disclosure of anonymized medical data to prevent inference or re-identification of original address data, and the use of a Markov-process based algorithm to mitigate this risk.
Published Version: doi://10.1186/1476-072X-7-45
Other Sources: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2526988/pdf/
http://www.ij-healthgeographics.com/content/7/1/45
Terms of Use: This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA
Citable link to this page: http://nrs.harvard.edu/urn-3:HUL.InstRepos:10140311
Downloads of this work:

Show full Dublin Core record

This item appears in the following Collection(s)

 
 

Search DASH


Advanced Search
 
 

Submitters