Person: Steinke, Thomas Alexander
Loading...
Email Address
AA Acceptance Date
Birth Date
Research Projects
Organizational Units
Job Title
Last Name
Steinke
First Name
Thomas Alexander
Name
Steinke, Thomas Alexander
4 results
Search Results
Now showing 1 - 4 of 4
Publication Upper and Lower Bounds for Privacy and Adaptivity in Algorithmic Data Analysis(2016-08-03) Steinke, Thomas Alexander; Vadhan, Salil; Nelson, Jelani; Sudan, Madhu; Valiant, Leslie; Barak, BoazThe increasing collection and use of sensitive personal data raises important privacy concerns. Another concern arising from the use of data in empirical sciences is the danger of producing results that are not statistically valid due to failing to account for the influence of previous exploration of the data. This thesis studies formalisations of these issues and the relationship between them. (i) We give an alternative definition of differential privacy, which is a formal privacy standard for protecting sensitive data. Our definition strikes a balance between the mathematical elegance of so-called pure differential privacy and the power of approximate differential privacy. (ii) We prove tighter upper and lower bounds for differential privacy. Namely, we bound the minimum size of a dataset that permits accurately answering simple one-way marginal queries subject to differential privacy. Our bounds are tight up to constant or log log factors. (iii) We show fundamental limits for privacy by exhibiting a privacy attack that, given the aggregate statistics of a suitable dataset and the data of an individual, can determine whether or not the individual's data is part of the dataset. Our attack is particularly applicable to genetic data and is robust to errors in the aggregate statistics. This attack is very similar to our lower bounds for differential privacy and demonstrates that differential privacy achieves the fundamental limit of privacy in this setting. (iv) We simplify, tighten, and extend the connection between differential privacy and generalisation established by Dwork et al. (STOC 2015). In particular, when data is analysed adaptively -- that is, multiple analyses are performed and each may depend on the outcome of previous analyses -- differentially private algorithms produce statistically valid results in the setting where the data is a sample from some larger population. Our results establish the tight connection between differential privacy and generalisation. (v) We prove lower bounds in the adaptive data analysis setting that nearly match the upper bounds given by differential privacy. Namely, we show that, given $n$ samples from an unknown distribution, we cannot answer more than O(n^2) adaptive statistical queries about that distribution while guaranteeing statistical accuracy. (vi) We show that adaptivity also poses a problem in differential privacy. We show that, for certain classes of queries, it is much harder to answer queries in a differentially private manner if the queries are posed adaptively than if the queries are provided all at once. All of our results rely on understanding the information-theoretic relationship between the input and output of a randomised algorithm. The criterion for protecting privacy or ensuring generalisation is that changing a single input point of the data analysis algorithm does not affect the output too much in a probabilistic sense.Publication Pseudorandomness for Regular Branching Programs via Fourier Analysis(Springer Berlin Heidelberg, 2013) Reingold, Omer; Steinke, Thomas Alexander; Vadhan, SalilWe present an explicit pseudorandom generator for oblivious, read-once, permutation branching programs of constant width that can read their input bits in any order. The seed length is \(O(log^2 n)\), where n is the length of the branching program. The previous best seed length known for this model was \(n^{ 1/2 + o(1)}\), which follows as a special case of a generator due to Impagliazzo, Meka, and Zuckerman (FOCS 2012) (which gives a seed length of \(s^{ 1/2 + o(1)}\) for arbitrary branching programs of size s). Our techniques also give seed length \(n^{ 1/2 + o(1)}\) for general oblivious, read-once branching programs of width \(2^{n^{o(1)}}\), which is incomparable to the results of Impagliazzo et al. Our pseudorandom generator is similar to the one used by Gopalan et al. (FOCS 2012) for read-once CNFs, but the analysis is quite different; ours is based on Fourier analysis of branching programs. In particular, we show that an oblivious, read-once, regular branching program of width w has Fourier mass at most \((2w^ 2) ^k\) at level k, independent of the length of the program.Publication Pseudorandomness and Fourier Growth Bounds for Width-3 Branching Programs(Dagstuhl Publishing, 2014) Steinke, Thomas Alexander; Vadhan, Salil; Wan, AndrewWe present an explicit pseudorandom generator for oblivious, read-once, width-3 branching programs, which can read their input bits in any order. The generator has seed length O~( log^3 n ). The previously best known seed length for this model is n^{1/2+o(1)} due to Impagliazzo, Meka, and Zuckerman (FOCS'12). Our work generalizes a recent result of Reingold, Steinke, and Vadhan (RANDOM'13) for permutation branching programs. The main technical novelty underlying our generator is a new bound on the Fourier growth of width-3, oblivious, read-once branching programs. Specifically, we show that for any f : {0,1}^n -> {0,1} computed by such a branching program, and k in [n], sum_{|s|=k} |hat{f}(s)| < n^2 * (O(\log n))^k, where f(x) = sum_s hat{f}(s) (-1)^ is the standard Fourier transform over Z_2^n. The base O(log n) of the Fourier growth is tight up to a factor of log log n.Publication Robust Traceability from Trace Amounts(2015) Dwork, Cynthia; Smith, Adam; Steinke, Thomas Alexander; Ullman, Jonathan; Vadhan, SalilThe privacy risks inherent in the release of a large number of summary statistics were illustrated by Homer et al. (PLoS Genetics, 2008), who considered the case of 1-way marginals of SNP allele frequencies obtained in a genome-wide association study: Given a large number of minor allele frequencies from a case group of individuals diagnosed with a particular disease, together with the genomic data of a single target individual and statistics from a sizable reference dataset independently drawn from the same population, an attacker can determine with high confidence whether or not the target is in the case group. In this work we describe and analyze a simple attack that succeeds even if the summary statistics are significantly distorted, whether due to measurement error or noise intentionally introduced to protect privacy. Our attack only requires that the vector of distorted summary statistics is close to the vector of true marginals in 1 norm. Moreover, the reference pool required by previous attacks can be replaced by a single sample drawn from the underlying population. The new attack, which is not specific to genomics and which handles Gaussian as well as Bernouilli data, significantly generalizes recent lower bounds on the noise needed to ensure differential privacy (Bun, Ullman, and Vadhan, STOC 2014; Steinke and Ullman, 2015), obviating the need for the attacker to control the exact distribution of the data.