Show simple item record

dc.contributor.advisorVadhan, Salilen_US
dc.contributor.authorSteinke, Thomas Alexanderen_US
dc.date.accessioned2017-09-08T14:42:26Z
dc.date.created2016-11en_US
dc.date.issued2016-08-03en_US
dc.date.submitted2016en_US
dc.identifier.citationSteinke, Thomas Alexander. 2016. Upper and Lower Bounds for Privacy and Adaptivity in Algorithmic Data Analysis. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.en_US
dc.identifier.urihttp://nrs.harvard.edu/urn-3:HUL.InstRepos:33840662
dc.description.abstractThe increasing collection and use of sensitive personal data raises important privacy concerns. Another concern arising from the use of data in empirical sciences is the danger of producing results that are not statistically valid due to failing to account for the influence of previous exploration of the data. This thesis studies formalisations of these issues and the relationship between them. (i) We give an alternative definition of differential privacy, which is a formal privacy standard for protecting sensitive data. Our definition strikes a balance between the mathematical elegance of so-called pure differential privacy and the power of approximate differential privacy. (ii) We prove tighter upper and lower bounds for differential privacy. Namely, we bound the minimum size of a dataset that permits accurately answering simple one-way marginal queries subject to differential privacy. Our bounds are tight up to constant or log log factors. (iii) We show fundamental limits for privacy by exhibiting a privacy attack that, given the aggregate statistics of a suitable dataset and the data of an individual, can determine whether or not the individual's data is part of the dataset. Our attack is particularly applicable to genetic data and is robust to errors in the aggregate statistics. This attack is very similar to our lower bounds for differential privacy and demonstrates that differential privacy achieves the fundamental limit of privacy in this setting. (iv) We simplify, tighten, and extend the connection between differential privacy and generalisation established by Dwork et al. (STOC 2015). In particular, when data is analysed adaptively -- that is, multiple analyses are performed and each may depend on the outcome of previous analyses -- differentially private algorithms produce statistically valid results in the setting where the data is a sample from some larger population. Our results establish the tight connection between differential privacy and generalisation. (v) We prove lower bounds in the adaptive data analysis setting that nearly match the upper bounds given by differential privacy. Namely, we show that, given $n$ samples from an unknown distribution, we cannot answer more than O(n^2) adaptive statistical queries about that distribution while guaranteeing statistical accuracy. (vi) We show that adaptivity also poses a problem in differential privacy. We show that, for certain classes of queries, it is much harder to answer queries in a differentially private manner if the queries are posed adaptively than if the queries are provided all at once. All of our results rely on understanding the information-theoretic relationship between the input and output of a randomised algorithm. The criterion for protecting privacy or ensuring generalisation is that changing a single input point of the data analysis algorithm does not affect the output too much in a probabilistic sense.en_US
dc.description.sponsorshipEngineering and Applied Sciences - Computer Scienceen_US
dc.format.mimetypeapplication/pdfen_US
dc.language.isoen_NZen_US
dash.licenseLAAen_US
dc.subjectComputer Scienceen_US
dc.titleUpper and Lower Bounds for Privacy and Adaptivity in Algorithmic Data Analysisen_US
dc.typeThesis or Dissertationen_US
dash.depositing.authorSteinke, Thomas Alexanderen_US
dc.date.available2017-09-08T14:42:26Z
thesis.degree.date2016en_US
thesis.degree.grantorGraduate School of Arts & Sciencesen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
dc.contributor.committeeMemberNelson, Jelanien_US
dc.contributor.committeeMemberSudan, Madhuen_US
dc.contributor.committeeMemberValiant, Leslieen_US
dc.contributor.committeeMemberBarak, Boazen_US
dc.type.materialtexten_US
thesis.degree.departmentEngineering and Applied Sciences - Computer Scienceen_US
dash.identifier.vireohttp://etds.lib.harvard.edu/gsas/admin/view/1172en_US
dc.description.keywordsdifferential privacy; generalization; adaptive data analysis; fingerprinting codesen_US
dash.author.emailtasmail@gmail.comen_US
dash.contributor.affiliatedSteinke, Thomas Alexander


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record