Person: Ullman, Jonathan
Email Address
AA Acceptance Date
Birth Date
Research Projects
Organizational Units
Job Title
Last Name
First Name
Name
Search Results
Publication Privacy and the Complexity of Simple Queries
(2013-09-16) Ullman, Jonathan; Vadhan, Salil P.; Mitzenmacher, Michael; Parkes, David; Nissim, KobbiAs both the scope and scale of data collection increases, an increasingly large amount of sensitive personal information is being analyzed. In this thesis, we study the feasibility of effectively carrying out such analyses while respecting the privacy concerns of all parties involved. In particular, we consider algorithms that satisfy differential privacy [30], a stringent notion of privacy that guarantees no individual’s data has a significant influence on the information released about the database. Over the past decade, there has been tremendous progress in understanding when accurate data analysis is compatible with differential privacy, with both elegant algorithms and striking impossibility results. However, if we ask further when accurate and computationally efficient data analysis is compatible with differential privacy then our understanding lags far behind. In this thesis, we make several contributions to understanding the complexity of differentially private data analysis: We show a sharp upper bound on the number of linear queries that can be accurately answered while satisfying differential privacy by an efficient algorithm, assuming the existence of cryptographic traitor-tracing schemes. We show even stronger computational barriers for algorithms that generate private synthetic data—a new database that consists of “fake” records but preserves certain statistical properties of the original database. Under cryptographic assumptions, any efficient differentially private algorithm that generates synthetic data cannot preserve even extremely simple properties of the database, even the pairwise correlations between the attributes. On the positive side, we design new algorithms for the widely-used class of marginal queries that are faster and require less data. Computational inefficiency is not the only barrier to effective privacy-preserving data analysis. Another potential obstacle is that many of the existing differentially private algorithms do not guarantee privacy for the data analyst, which would lead researchers with sensitive or proprietary queries to seek other means of access to the database. We also contribute to our understanding of privacy for the analyst: We design new algorithms for answering large sets of queries that guarantee differential privacy for the database and ensure differential privacy for the analysts, even if all other analysts collude.
Publication Faster Algorithms for Privately Releasing Marginals
(Springer Verlag, 2012) Thaler, Justin; Ullman, Jonathan; Vadhan, SalilWe study the problem of releasing k-way marginals of a database D ∈ {0,1}d)n, while preserving differential privacy. The an- swer to a k-way marginal query is the fraction of D’s records x ∈ {0, 1}d with a given value in each of a given set of up to k columns. Marginal queries enable a rich class of statistical analyses of a dataset, and de- signing efficient algorithms for privately releasing marginal queries has been identified as an important open problem in private data analysis (cf. Barak et. al., PODS ’07). We give an algorithm that runs in time dO( k) and releases a private summary capable of answering any k-way marginal query with at most ±.01 error on every query as long as n ≥ dO( k). To our knowledge, ours is the first algorithm capable of privately releasing marginal queries with non-trivial worst-case accuracy guarantees in time substantially smaller than the number of k-way marginal queries, which is dΘ(k) (for k ≪ d).
Publication Fingerprinting codes and the price of approximate differential privacy
(Association of Computing Machinery, 2014) Bun, Mark Mar; Ullman, Jonathan; Vadhan, SalilWe show new lower bounds on the sample complexity of (ε, δ)-differentially private algorithms that accurately answer large sets of counting queries. A counting query on a database D ∈ ({0, 1}d)n has the form "What fraction of the individual records in the database satisfy the property q?" We show that in order to answer an arbitrary set Q of » nd counting queries on D to within error ±α it is necessary that [EQUATION] This bound is optimal up to poly-logarithmic factors, as demonstrated by the Private Multiplicative Weights algorithm (Hardt and Rothblum, FOCS'10). It is also the first to show that the sample complexity required for (ε, δ)-differential privacy is asymptotically larger than what is required merely for accuracy, which is O(log |Q|/α2). In addition, we show that our lower bound holds for the specific case of k-way marginal queries (where |Q| = 2k(d/k)) when α is a constant. Our results rely on the existence of short fingerprinting codes (Boneh and Shaw, CRYPTO'95; Tardos, STOC'03), which we show are closely connected to the sample complexity of differentially private data release. We also give a new method for combining certain types of sample complexity lower bounds into stronger lower bounds.
Publication Integrating Approaches to Privacy Across the Research Lifecycle: When Is Information Purely Public?
(Berkman Center for Internet & Society, 2015) Gasser, Urs; O'Brien, David R.; Ullman, Jonathan; Altman, Micah; Bar-sinai, Michael; Nissim, Kobbi; Vadhan, Salil; Wojcik, Michael John; Wood, AlexandraOn September 24-25, 2013, the Privacy Tools for Sharing Research Data project at Harvard University held a workshop titled "Integrating Approaches to Privacy across the Research Data Lifecycle." Over forty leading experts in computer science, statistics, law, policy, and social science research convened to discuss the state of the art in data privacy research. The resulting conversations centered on the emerging tools and approaches from the participants’ various disciplines and how they should be integrated in the context of real-world use cases that involve the management of confidential research data.
Researchers are increasingly obtaining data from social networking websites, publicly-placed sensors, government records and other public sources. Much of this information appears public, at least to first impressions, and it is capable of being used in research for a wide variety of purposes with seemingly minimal legal restrictions. The insights about human behaviors we may gain from research that uses this data are promising. However, members of the research community are questioning the ethics of these practices, and at the heart of the matter are some difficult questions about the boundaries between public and private information. This workshop report, the second in a series, identifies selected questions and explores issues around the meaning of “public” in the context of using data about individuals for research purposes.