The effect of quasi-identifier characteristics on statistical bias introduced by k-anonymization
MetadataShow full item record
CitationAngiuli, Olivia Marie. 2015. The effect of quasi-identifier characteristics on statistical bias introduced by k-anonymization. Bachelor's thesis, Harvard College.
AbstractThe de-identification of publicly released datasets that contain personal information is necessary to preserve personal privacy. One such de-identification algorithm, k-anonymization, reduces the risk of the re-identification of such datasets by requiring that each combination of information-revealing traits be represented by at least k different records in the dataset. However, this requirement may skew the resulting dataset by preferentially deleting records that contain more rare information-revealing traits. This paper investigates the amount of bias and loss of utility introduced into an online education dataset by the k-anonymization process, as well as suggesting future directions that may decrease the amount of bias introduced during de-identification procedures.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:14398529
- FAS Theses and Dissertations