Person:

Ullman, Jonathan

Loading...
Profile Picture

Email Address

AA Acceptance Date

Birth Date

Research Projects

Organizational Units

Job Title

Last Name

Ullman

First Name

Jonathan

Name

Ullman, Jonathan

Search Results

Now showing 1 - 4 of 4
  • Publication

    Privacy and the Complexity of Simple Queries

    (2013-09-16) Ullman, Jonathan; Vadhan, Salil P.; Mitzenmacher, Michael; Parkes, David; Nissim, Kobbi

    As both the scope and scale of data collection increases, an increasingly large amount of sensitive personal information is being analyzed. In this thesis, we study the feasibility of effectively carrying out such analyses while respecting the privacy concerns of all parties involved. In particular, we consider algorithms that satisfy differential privacy [30], a stringent notion of privacy that guarantees no individual’s data has a significant influence on the information released about the database. Over the past decade, there has been tremendous progress in understanding when accurate data analysis is compatible with differential privacy, with both elegant algorithms and striking impossibility results. However, if we ask further when accurate and computationally efficient data analysis is compatible with differential privacy then our understanding lags far behind. In this thesis, we make several contributions to understanding the complexity of differentially private data analysis: We show a sharp upper bound on the number of linear queries that can be accurately answered while satisfying differential privacy by an efficient algorithm, assuming the existence of cryptographic traitor-tracing schemes. We show even stronger computational barriers for algorithms that generate private synthetic data—a new database that consists of “fake” records but preserves certain statistical properties of the original database. Under cryptographic assumptions, any efficient differentially private algorithm that generates synthetic data cannot preserve even extremely simple properties of the database, even the pairwise correlations between the attributes. On the positive side, we design new algorithms for the widely-used class of marginal queries that are faster and require less data. Computational inefficiency is not the only barrier to effective privacy-preserving data analysis. Another potential obstacle is that many of the existing differentially private algorithms do not guarantee privacy for the data analyst, which would lead researchers with sensitive or proprietary queries to seek other means of access to the database. We also contribute to our understanding of privacy for the analyst: We design new algorithms for answering large sets of queries that guarantee differential privacy for the database and ensure differential privacy for the analysts, even if all other analysts collude.

  • Publication

    Faster Algorithms for Privately Releasing Marginals

    (Springer Verlag, 2012) Thaler, Justin; Ullman, Jonathan; Vadhan, Salil

    We study the problem of releasing k-way marginals of a database D ∈ {0,1}d)n, while preserving differential privacy. The an- swer to a k-way marginal query is the fraction of D’s records x ∈ {0, 1}d with a given value in each of a given set of up to k columns. Marginal queries enable a rich class of statistical analyses of a dataset, and de- signing efficient algorithms for privately releasing marginal queries has been identified as an important open problem in private data analysis (cf. Barak et. al., PODS ’07). We give an algorithm that runs in time dO( k) and releases a private summary capable of answering any k-way marginal query with at most ±.01 error on every query as long as n ≥ dO( k). To our knowledge, ours is the first algorithm capable of privately releasing marginal queries with non-trivial worst-case accuracy guarantees in time substantially smaller than the number of k-way marginal queries, which is dΘ(k) (for k ≪ d).

  • Publication

    Fingerprinting codes and the price of approximate differential privacy

    (Association of Computing Machinery, 2014) Bun, Mark Mar; Ullman, Jonathan; Vadhan, Salil

    We show new lower bounds on the sample complexity of (ε, δ)-differentially private algorithms that accurately answer large sets of counting queries. A counting query on a database D ∈ ({0, 1}d)n has the form "What fraction of the individual records in the database satisfy the property q?" We show that in order to answer an arbitrary set Q of » nd counting queries on D to within error ±α it is necessary that [EQUATION] This bound is optimal up to poly-logarithmic factors, as demonstrated by the Private Multiplicative Weights algorithm (Hardt and Rothblum, FOCS'10). It is also the first to show that the sample complexity required for (ε, δ)-differential privacy is asymptotically larger than what is required merely for accuracy, which is O(log |Q|/α2). In addition, we show that our lower bound holds for the specific case of k-way marginal queries (where |Q| = 2k(d/k)) when α is a constant. Our results rely on the existence of short fingerprinting codes (Boneh and Shaw, CRYPTO'95; Tardos, STOC'03), which we show are closely connected to the sample complexity of differentially private data release. We also give a new method for combining certain types of sample complexity lower bounds into stronger lower bounds.

  • Publication

    Integrating Approaches to Privacy Across the Research Lifecycle: When Is Information Purely Public?

    (Berkman Center for Internet & Society, 2015) Gasser, Urs; O'Brien, David R.; Ullman, Jonathan; Altman, Micah; Bar-sinai, Michael; Nissim, Kobbi; Vadhan, Salil; Wojcik, Michael John; Wood, Alexandra

    On September 24-25, 2013, the Privacy Tools for Sharing Research Data project at Harvard University held a workshop titled "Integrating Approaches to Privacy across the Research Data Lifecycle." Over forty leading experts in computer science, statistics, law, policy, and social science research convened to discuss the state of the art in data privacy research. The resulting conversations centered on the emerging tools and approaches from the participants’ various disciplines and how they should be integrated in the context of real-world use cases that involve the management of confidential research data.

    Researchers are increasingly obtaining data from social networking websites, publicly-placed sensors, government records and other public sources. Much of this information appears public, at least to first impressions, and it is capable of being used in research for a wide variety of purposes with seemingly minimal legal restrictions. The insights about human behaviors we may gain from research that uses this data are promising. However, members of the research community are questioning the ethics of these practices, and at the heart of the matter are some difficult questions about the boundaries between public and private information. This workshop report, the second in a series, identifies selected questions and explores issues around the meaning of “public” in the context of using data about individuals for research purposes.