Publication: New Separations in the Complexity of Differential Privacy
No Thumbnail Available
Date
2016-08-03
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Bun, Mark Mar. 2016. New Separations in the Complexity of Differential Privacy. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
Research Data
Abstract
In this thesis, we study when algorithmic tasks can be performed on sensitive data while protecting the privacy of individuals whose information is collected. We focus on the notion of differential privacy, which gives a strong formal guarantee that no individual's data has a significant impact on the outcome of a computation. Intense study over the last decade has shown that a rich variety of statistical analyses can be performed with differential privacy. However, the corresponding private algorithms generally require more computational resources than their non-private counterparts. The goal of this thesis is to improve our understanding of two basic measures of the inherent complexity of differential privacy: sample complexity and computational complexity.
In the first part of this thesis, we study the sample complexity -- the minimum amount of data needed to obtain accurate results -- of basic query release tasks. We show, for the first time, that approximate differential privacy can demand higher sample complexity than what is needed to ensure statistical accuracy alone. In particular:
* We establish tight lower bounds on the sample complexity of answering marginal queries with differential privacy. Our results are based on a new connection between privacy lower bounds and cryptographic fingerprinting codes.
* We show the first lower bounds against privately releasing the simple class of threshold functions. This reveals a price of privacy even for low-dimensional query families.
* We study how the sample complexity of private query release changes depending on whether queries are given to an algorithm offline, online or adaptively, and expose significant gaps between these three models.
Next, we examine the sample complexity of differentially private PAC learning. Again, we exhibit separations between what is statistically feasible and what is feasible with differential privacy.
* We exhibit a lower bound for properly learning threshold functions with differential privacy. This separates differentially private from non-private proper PAC learning.
* We initiate the study of learning multiple concepts simultaneously under differential privacy. By contrast with the non-private case, the sample complexity of private learning grows significantly with the number of concepts being learned.
Finally, we address the computational complexity of PAC learning with differential privacy.
* Under cryptographic assumptions, we give the first example of a concept class that is efficiently PAC learnable, but not efficiently learnable with differential privacy.
Description
Other Available Sources
Keywords
Computer Science
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service