Publication: The effect of quasi-identifier characteristics on statistical bias introduced by k-anonymization
No Thumbnail Available
Date
2015-04-08
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Angiuli, Olivia Marie. 2015. The effect of quasi-identifier characteristics on statistical bias introduced by k-anonymization. Bachelor's thesis, Harvard College.
Research Data
Abstract
The de-identification of publicly released datasets that contain personal information is necessary to preserve personal privacy. One such de-identification algorithm, k-anonymization, reduces the risk of the re-identification of such datasets by requiring that each combination of information-revealing traits be represented by at least k different records in the dataset. However, this requirement may skew the resulting dataset by preferentially deleting records that contain more rare information-revealing traits. This paper investigates the amount of bias and loss of utility introduced into an online education dataset by the k-anonymization process, as well as suggesting future directions that may decrease the amount of bias introduced during de-identification procedures.
Description
Other Available Sources
Keywords
Statistics, Computer Science
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service