Privacy and the Complexity of Simple Queries

DSpace/Manakin Repository

Privacy and the Complexity of Simple Queries

Citable link to this page

 

 
Title: Privacy and the Complexity of Simple Queries
Author: Ullman, Jonathan Robert
Citation: Ullman, Jonathan Robert. 2013. Privacy and the Complexity of Simple Queries. Doctoral dissertation, Harvard University.
Full Text & Related Files:
Abstract: As both the scope and scale of data collection increases, an increasingly large amount of sensitive personal information is being analyzed. In this thesis, we study the feasibility of effectively carrying out such analyses while respecting the privacy concerns of all parties involved. In particular, we consider algorithms that satisfy differential privacy [30], a stringent notion of privacy that guarantees no individual’s data has a significant influence on the information released about the database. Over the past decade, there has been tremendous progress in understanding when accurate data analysis is compatible with differential privacy, with both elegant algorithms and striking impossibility results. However, if we ask further when accurate and computationally efficient data analysis is compatible with differential privacy then our understanding lags far behind. In this thesis, we make several contributions to understanding the complexity of differentially private data analysis: We show a sharp upper bound on the number of linear queries that can be accurately answered while satisfying differential privacy by an efficient algorithm, assuming the existence of cryptographic traitor-tracing schemes. We show even stronger computational barriers for algorithms that generate private synthetic data—a new database that consists of “fake” records but preserves certain statistical properties of the original database. Under cryptographic assumptions, any efficient differentially private algorithm that generates synthetic data cannot preserve even extremely simple properties of the database, even the pairwise correlations between the attributes. On the positive side, we design new algorithms for the widely-used class of marginal queries that are faster and require less data. Computational inefficiency is not the only barrier to effective privacy-preserving data analysis. Another potential obstacle is that many of the existing differentially private algorithms do not guarantee privacy for the data analyst, which would lead researchers with sensitive or proprietary queries to seek other means of access to the database. We also contribute to our understanding of privacy for the analyst: We design new algorithms for answering large sets of queries that guarantee differential privacy for the database and ensure differential privacy for the analysts, even if all other analysts collude.
Terms of Use: This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA
Citable link to this page: http://nrs.harvard.edu/urn-3:HUL.InstRepos:11041647
Downloads of this work:

Show full Dublin Core record

This item appears in the following Collection(s)

 
 

Search DASH


Advanced Search
 
 

Submitters