New Separations in the Complexity of Differential Privacy
CitationBun, Mark Mar. 2016. New Separations in the Complexity of Differential Privacy. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
AbstractIn this thesis, we study when algorithmic tasks can be performed on sensitive data while protecting the privacy of individuals whose information is collected. We focus on the notion of differential privacy, which gives a strong formal guarantee that no individual's data has a significant impact on the outcome of a computation. Intense study over the last decade has shown that a rich variety of statistical analyses can be performed with differential privacy. However, the corresponding private algorithms generally require more computational resources than their non-private counterparts. The goal of this thesis is to improve our understanding of two basic measures of the inherent complexity of differential privacy: sample complexity and computational complexity.
In the first part of this thesis, we study the sample complexity -- the minimum amount of data needed to obtain accurate results -- of basic query release tasks. We show, for the first time, that approximate differential privacy can demand higher sample complexity than what is needed to ensure statistical accuracy alone. In particular:
* We establish tight lower bounds on the sample complexity of answering marginal queries with differential privacy. Our results are based on a new connection between privacy lower bounds and cryptographic fingerprinting codes.
* We show the first lower bounds against privately releasing the simple class of threshold functions. This reveals a price of privacy even for low-dimensional query families.
* We study how the sample complexity of private query release changes depending on whether queries are given to an algorithm offline, online or adaptively, and expose significant gaps between these three models.
Next, we examine the sample complexity of differentially private PAC learning. Again, we exhibit separations between what is statistically feasible and what is feasible with differential privacy.
* We exhibit a lower bound for properly learning threshold functions with differential privacy. This separates differentially private from non-private proper PAC learning.
* We initiate the study of learning multiple concepts simultaneously under differential privacy. By contrast with the non-private case, the sample complexity of private learning grows significantly with the number of concepts being learned.
Finally, we address the computational complexity of PAC learning with differential privacy.
* Under cryptographic assumptions, we give the first example of a concept class that is efficiently PAC learnable, but not efficiently learnable with differential privacy.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:33840647
- FAS Theses and Dissertations