Separating Computational and Statistical Differential Privacy in the Client-Server Model

. Diﬀerential privacy is a mathematical deﬁnition of privacy for statistical data analysis. It guarantees that any (possibly adversarial) data analyst is unable to learn too much information that is speciﬁc to an individual. Mironov et al. (CRYPTO 2009) proposed several computational relaxations of diﬀerential privacy (CDP), which relax this guarantee to hold only against computationally bounded adversaries. Their work and subsequent work showed that CDP can yield substantial accuracy improvements in various multiparty privacy problems. However, these works left open whether such improvements are possible in the traditional client-server model of data analysis. In fact, Groce, Katz and Yerukhimovich (TCC 2011) showed that, in this setting, it is impossible to take advantage of CDP for many natural statistical tasks. Our main result shows that, assuming the existence of sub-exponentially secure one-way functions and 2-message witness indistinguishable proofs (zaps) for NP , that there is in fact a computational task in the client-server model that can be eﬃciently performed with CDP, but is infeasible to perform with information-theoretic diﬀerential privacy.


Introduction
Differential privacy is a formal mathematical definition of privacy for the analysis of statistical datasets.It promises that a data analyst (treated as an adversary) cannot learn too much individual-level information from the outcome of an analysis.The traditional definition of differential privacy makes this promise information-theoretically: Even a computationally unbounded adversary is limited in the amount of information she can learn that is specific to an individual.On one hand, there are now numerous techniques that actually achieve this strong guarantee of privacy for a rich body of computational tasks.On the other hand, the information-theoretic definition of differential privacy does not itself permit the use of basic cryptographic primitives that naturally arise in the practice of differential privacy (such as the use of cryptographically secure pseudorandom generators in place of perfect randomness).More importantly, computationally secure relaxations of differential privacy open the door to designing improved mechanisms: ones that either achieve better utility (accuracy) or computational efficiency over their information-theoretically secure counterparts.
Motivated by these observations, and building on ideas suggested in [BNO08], Mironov et al. [MPRV09] proposed several definitions of computational differential privacy (CDP).All of these definitions formalize what it means for the output of a mechanism to "look" differentially private to a computationally bounded (i.e.probabilistic polynomial-time) adversary.The sequence of works [DKM + 06, BNO08, MPRV09] introduced a paradigm that enables two or more parties to take advantage of CDP, either to achieve better utility or reduced round complexity, when computing a joint function of their private inputs: The parties use a secure multi-party computation protocol to simulate having a trusted third party perform a differentially private computation on the union of their inputs.Subsequent work [MMP + 10] showed that such a CDP protocol for approximating the Hamming distance between two private bit vectors is in fact more accurate than any (information-theoretically secure) differentially private protocol for the same task.A number of works [CSS12, GMPS13, HOZ13, KMS14, GKM + 16] have since sought to characterize the extent to which CDP yields accuracy improvements for two-party privacy problems.
Despite the success of CDP in the design of improved algorithms in the multiparty setting, much less is known about what can be achieved in the traditional client-server model, in which a trusted curator holds all of the sensitive data and mediates access to it.Beyond just the absence of any techniques for taking advantage of CDP in this setting, results of Groce, Katz, and Yerukhimovich [GKY11] (discussed in more detail below) show that CDP yields no additional power in the client-server model for many basic statistical tasks.An additional barrier stems from the fact that all known lower bounds against computationally efficient differentially private algorithms [DNR + 09, UV11, Ull13, BZ14, BZ16] in the client-server model are proved by exhibiting computationally efficient adversaries.Thus, these lower bounds rule out the existence of CDP mechanisms just as well as they rule out differentially private ones.
In this work, we give the first example of a computational problem in the client-server model which can be solved in polynomial-time with CDP, but (under plausible assumptions) is computationally infeasible to solve with (informationtheoretic) differential privacy.Our problem is specified by an efficiently computable utility function u, which takes as input a dataset D ∈ X n and an an-swer r ∈ R, and outputs 1 if the answer r is "good" for the dataset D, and 0 otherwise.
Note that the theorem provides a task where achieving differential privacy is infeasible -not impossible.This is inherent because the CDP mechanism we exhibit (for item 1) satisfies a simulation-based form of CDP ("SIM-CDP"), which implies the existence of a (possibly inefficient) differentially private mechanism, provided the utility function u is efficiently computable as we require.It remains an intriguing open problem to exhibit a task that can be achieved with a weaker indistinguishably-based notion of CDP ("IND-CDP") but is impossible to achieve (even inefficiently) with differential privacy.Such a task would also separate IND-CDP and SIM-CDP, which is an interesting open problem in its own right.
Circumventing the impossibility results of [GKY11].Groce et al. showed that in many natural circumstances, computational differential privacy cannot yield any additional power over differential privacy in the client-server model.In particular, they showed two impossibility results: 1.If a CDP mechanism accesses a one-way function (or more generally, any cryptographic primitive that can be instantiated with a random function) in a black-box way, then it can be simulated just as well (in terms of both utility and computationally efficiency) by a differentially private mechanism.2. If the output of a CDP mechanism is in R d (for some constant d) and its utility is measured via an L p -norm, then the mechanism can be simulated by a differentially private one, again without significant loss of utility or efficiency.
(In Section 4, we revisit the techniques [GKY11] to strengthen the second result in some circumstances.In general, we show that when error is measured in any metric with doubling dimension O(log k), CDP cannot improve utility by more than a constant factor.Specifically, respect to L p -error, CDP cannot do much better than DP mechanisms even when d is logarithmic in the security parameter.) We get around both of these impossibility results by 1) making non-blackbox use of one-way functions via the machinery of zap proofs and 2) relying on a utility function that is far from the form in which the second result of [GKY11] applies.Indeed, our utility function is cryptographic and unnatural from a data analysis point view.Roughly speaking, it asks whether the answer r is a valid zap proof of the statement "there exists a row of the dataset D that is a valid message-signature pair" for a secure digital signature scheme.It remains an intriguing problem for future work whether a separation can be obtained from a more natural task (such as answering a polynomial number of counting queries with differential privacy).
Our construction and techniques.Our construction is based on the existence of two cryptographic primitives: an existentially unforgeable digital signature scheme (Gen, Sign, Ver), and a 2-message witness indistinguishable proof system (zap) (P, V ) for NP.We make use of complexity leveraging [CGGM00] and thus require a complexity gap between the two primitives: namely, a sub-exponential time algorithm should be able to break the security of the zap proof system, but should not be able to forge a valid message-signature pair for the digital signature scheme.
We now describe (eliding technical complications) the computational task which allows us to separate computational and information-theoretic differential privacy in the client-server model.Inspired by prior differential privacy lower bounds [DNR + 09, UV11], we consider a dataset D that consists of many valid message-signature pairs (m 1 , σ 1 ), . . ., (m n , σ n ) for the digital signature scheme.We say that a mechanism M gives a useful answer on D, i.e. the utility function u(D, M (D)) evaluates to 1, if it produces a proof π in the zap proof system that there exists a message-signature pair (m, σ) for which Ver(m, σ) = 1.
First, let us see how the above task can be performed inefficiently with differential privacy.Consider the mechanism M unb that first confirms (in a standard differentially private way) that its input dataset indeed contains "many" valid message-signature pairs.Then M unb uses its unbounded computational resources to forge a canonical valid message-signature pair (m, σ) and uses the zap prover on witness (m, σ) to produce a proof π.Since the choice of the forged pair does not depend on the input dataset at all, the procedure as a whole is differentially private.Now let us see how a CDP mechanism can perform the same task efficiently.Our mechanism M CDP again first checks that it possesses many valid messagesignature pairs, but this time it simply outputs a proof π using an arbitrary valid pair (m i , σ i ) ∈ D as its witness.Since the proof system is witness indistinguishable, a computationally bounded observer cannot distinguish π from the canonical proof output by the differentially private mechanism M unb .Thus, the mechanism M CDP is in fact CDP in the strongest (simulation-based) sense.
Despite the existence of the inefficient differentially private mechanism M unb , we show that the existence of an efficient mechanism M for this task would violate the sub-exponential security of the digital signature scheme.Suppose there were such a mechanism M .Now consider a sub-exponential time adversary A that completely breaks the security of the zap proof system, in the sense that given a valid proof π, it is always able to recover a corresponding witness (m, σ).Since M is differentially private, the (m, σ) extracted by A cannot be in the dataset D given to M .Thus, (m, σ) constitutes a forgery of a valid messagesignature pair, and hence the composed algorithm A • M violates the security of the signature scheme.

(Computational) Differential Privacy
We first set notations that will be used throughout this paper, and recall the notions of (ε, δ)-differential privacy and computational differential privacy.The abbreviation "PPT" stands for "probabilistic polynomial-time Turing machine."Security parameter k.Let k ∈ N denote a security parameter.In this work, datasets, privacy-preserving mechanisms, and privacy parameters ε, δ will all be sequences parameterized in terms of k.Adversaries will also have their computational power parameterized by k; in particular, efficient adversaries have circuit size polynomial in k.A function is said to be negligible if it vanishes faster than any inverse polynomial in k.
Dataset D. A dataset D is an ordered tuple of n elements from some data universe X .Two datasets D, D are said to be adjacent (written D ∼ D ) if they differ in at most one row.We use {D k } k∈N to denote a sequence of datasets, each over a data universe X k , with sizes growing with the parameter k.The size in bits of a dataset D k , and in particular the number of rows n, will always be poly(k).
Mechanism M .A mechanism M : X * → R is a randomized function taking a dataset D ∈ X * to an output in a range space R. We will be especially interested in ensembles of efficient mechanisms {M k } k∈N where each M k : X * k → R k , when run on an input dataset D ∈ X n k , runs in time poly(k, n).Equivalently, for all adjacent datasets D ∼ D and every (computationally unbounded) algorithm A, we have For consistency with the definition of SIM-CDP, we also make the following definitions for sequences of mechanisms: The above definitions are completely information-theoretic.
Mironov et al. [MPRV09] also proposed a stronger "simulation-based" definition of computational differential privacy.A mechanism is said to be ε-SIM-CDP if its output is computationally indistinguishable from that of an ε-differentially private mechanism: Definition 3 (SIM-CDP).A sequence of mechanisms {M k } k∈N is ε k -SIM-CDP if there exists a negligible function negl(•) and a family of mechanisms {M k } k∈N that is ε k -differentially private such that for all poly(k)-size datasets D, and all non-uniform polynomial time adversaries A, Writing A B to denote that a mechanism satisfying definition A also satisfies definition B (that is, A is a stricter privacy definition than B).We have the following relationships between the various notions of (computational) differential privacy: DP SIM-CDP IND-CDP.
We will state and prove our separation between CDP and differential privacy for the simulation-based definition SIM-CDP.Since SIM-CDP is a stronger privacy notion than IND-CDP, this implies a separation between IND-CDP and differential privacy as well.

Utility
We describe an abstract notion of what it means for a mechanism to "succeed" at performing a computational task.We define a computational task implicitly in terms of an efficiently computable utility function, which takes as input a dataset D ∈ X * and an answer r ∈ R and outputs a score describing how well r solves a given problem on instance D. For our purposes, it suffices to consider binary-valued utility functions u, which output 1 iff the answer r is "good" for the dataset D.
Definition 4 (Utility).A utility function is an efficiently computable (deterministic) function u : Restricting our attention to efficiently computable utility functions is necessary to rule out pathological separations between computational and statistical notions of differential privacy.For instance, let {G k } k∈N be a pseudorandom generator with G k : {0, 1} k → {0, 1} 2k , and consider the (hard-to-compute) function and samples a random string if b = 1 is useful with overwhelming probability.Moreover, M is computationally indistinguishable from the mechanism that always outputs a random string, and hence SIM-CDP.On the other hand, the supports of u(0, •) and u(1, •) are disjoint, so no differentially private mechanism can achieve high utility with respect to u.

Zaps (2-Message WI Proofs)
The first cryptographic tool we need in our construction is 2-message witness indistinguishable proofs for NP ("zaps") [FS90,DN07] in the plain model (with no common reference string).Consider a language L ∈ NP.A witness relation for L is a polynomial-time decidable binary relation R L = {(x, w)} such that |w| ≤ poly(|x|) whenever (x, w) ∈ R L , and Definition 5 (Zap).Let R L = {(x, w)} be a witness-relation corresponding to a language L ∈ NP.A zap proof system for R L consists of a pair of algorithms (P, V ) where: -In the first round, the verifier sends a message ρ ← {0, 1} (k,|x|) ("public coins"), where (•, •) is a fixed polynomial.-In the second round, the prover runs a PPT P that takes as input a pair (x, w) and verifier's first message ρ and outputs a proof π.
-The verifier runs an efficient, deterministic algorithm V that takes as input an instance x, a first-round message ρ, and proof π, and outputs a bit in {0, 1}.
The security requirements of the proof system are: 1. Perfect completeness.An honest prover who possesses a valid witness can always convince an honest verifier.Formally, for all x ∈ {0, 1} poly(k) , (x, w) ∈ R L , and ρ ∈ {0, 1} (k,|x|) , 2. Statistical soundness.With overwhelming probability over the choice of ρ, it is impossible to convince an honest verifier of the validity of a false statement.Formally, there exists a negligible function negl(•) such that for all sufficiently large k and t = poly(k), we have and every choice of the verifier's first message ρ, we have Namely, for every such pair of sequences, there exists a negligible function negl(•) such that for all polynomial-time adversaries A and all sufficiently large k, we have In our construction, we will need more fine-grained control over the security of our zap proof system.In particular, we need the proof system to be extractable by an adversary running in time 2 O(k) , in that such an adversary can always reverse-engineer a valid proof π to find a witness w such that (x, w) ∈ R L .It is important to note that we require the running time of the adversary to be exponential in the security parameter k, but otherwise independent of the statement size |x|.
Definition 6 (Extractable Zap).The algorithm triple (P, V, E) is an extractable zap proof system if (P, V ) is a zap proof system and there exists an algorithm E running in time 2 O(k) with the following property: 4. (Exponential Statistical) Extractability.There exists a negligible function negl(•) such that for all x ∈ {0, 1} poly(k) : While we do not know whether extractability is a generic property of zaps, it is preserved under Dwork and Naor's reduction to NIZKs in the common random string model.Namely, if we plug an extractable NIZK into Dwork and Naor's construction, we obtain an extractable zap.
Theorem 2. Every language in NP has an extractable zap proof system (P, V, E), as defined in Definition 6, if there exists non-interactive zero-knowledge proofs of knowledge for NP [DN07].
For completeness, we sketch Dwork and Naor's construction in Appendix B and argue its extractability.

Digital Signatures
The other ingredient we need in our construction is sub-exponentially strongly unforgeable digital signature schemes.Here "strong unforgeability" [ADR02] means that the adversary in the existential unforgeability game is allowed to forge a signature for a message it has queried before, as long as the signature is different than the one it received.

Existential unforgeability. There exists a negligible function negl(•)
such that for all adversaries A running in time where Q is the set of messages-signature pairs obtained through A's use of the signing oracle.
Theorem 3. If sub-exponentially secure one-way functions exist, then there is a constant c ∈ (0, 1) such that a c-strongly unforgeable digital signature scheme exists.
The reduction from a one-way function to digital signature [NY89, Rom90, KK05, Gol04] can be applied when both schemes are secure against sub-exponential time adversaries.

Separating CDP and Differential Privacy
In this section, we define a computational problem in the client-server model that can be efficiently solved with CDP, but not with statistical differential privacy.That is, we define a utility function u for which there exists a CDP mechanism achieving high utility.On the other hand, any efficient differentially private algorithm can only have negligible utility.
Theorem 4 (Main).Assume the existence of sub-exponentially secure oneway functions and extractable zaps for NP.Then there exists a sequence of data universes {X k } k∈N , range spaces {R k } k∈N and an (efficiently computable) utility function Remark 1.We can only hope to separate SIM-CDP and differential privacy by designing a task that is infeasible with differential privacy but not impossible.
By the definition of (PURE-)SIM-CDP for a mechanism {M k } k∈N , there exists an

Construction
Let (Gen, Sign, Ver) be a c-strongly unforgeable secure digital signature scheme with parameter c > 0 as in Definition 7.After fixing c, we define for each k ∈ N a reduced security parameter k c = k c/2 .We will use k c as the security parameter for an extractable zap proof system (P, V, E).Since k and k c are polynomially related, a negligible function in k is negligible in k c and vice versa.
Given a security parameter k ∈ N, define the following sets of bit strings: Verification Key Space: is the length of firstround zap messages used to prove statements from K k under security parameter k c , Data Universe: That is, similarly to one the hardness results of [DNR + 09], we consider datasets D that contain n rows of the form x 1 = (vk 1 , m 1 , σ 1 , ρ 1 ), . . ., x n = (vk n , m n , σ n , ρ n ) each corresponding to a verification key, message, and signature from the digital signature scheme, and to a zap verifier's public coin tosses.
Let L ∈ NP be the language which has the natural witness relation

Define
Proof Space: Π k = {0, 1} 4 where 4 = |π| for π ← P (1 kc , vk, (m, σ), ρ) for vk ∈ (L ∩ K k ) with witness (m, σ) ∈ M k × S k and public coins ρ ∈ P k , and Output Space: Definition of Utility Function u.We now specify our computational task of interest via a utility function u : That is, f vk,ρ is the number of elements of the dataset D with verification key equal to vk and public coin string equal to ρ for which (m i , σ i ) is a valid messagesignature pair under vk.We now define u(D, (vk, ρ, π)) = 1 iff That is, the utility function u is satisfied if either 1) many entries of D contain valid message-signature pairs under the same verification key vk with the same public coin string ρ and π is a valid proof for statement vk using ρ, or 2) it is not the case that many entries of D contain valid message-signature pairs under the same verification key, with the same public coin string (in which case any response (vk, ρ, π) is acceptable).

An Inefficient Differentially Private Algorithm
We begin by showing that there is an inefficient differentially private mechanism that achieves high utility under u.
Proposition 1.Let k ∈ N.For every ε > 0, there exists an (ε, 0)-differentially private algorithm M unb k : X n k → R k such that, for every β > 0, every n ≥ While the mechanism M unb considered here is only accurate for n ≥ Ω(log |P k |), it is also possible to use "stability techniques" [DL09,TS13] to design an (ε, δ)-differentially private mechanism that achieves high utility for n ≥ O(log(1/δ)/ε) for δ > 0. We choose to provide a "pure" ε-differentially private algorithm here to make our separation more dramatic: Both the inefficient differentially private mechanism and the efficient SIM-CDP mechanism achieve pure (ε, 0)-privacy, whereas no efficient mechanism can even achieve (ε, δ)-differential privacy with δ > 0.
Our algorithm relies on standard differentially private techniques for identifying frequently occurring elements in a dataset.
Report Noisy Max.Consider a data universe X .A predicate q : X → {0, 1} defines a counting query over the set of datasets X n as follows: For D = (x 1 , . . ., x n ) ∈ X n , we abuse notation by defining q(D) = n i=1 q(x i ).We further say that a collection of counting queries Q is disjoint if, whenever q(x) = 1 for some q ∈ Q and x ∈ X , we have q (x) = 0 for every other q = q in Q. (Thus, disjoint counting queries slightly generalize point functions, which are each supported on exactly one element of the domain X .) The "Report Noisy Max" algorithm [DR14], combined with observations of [BV16], can efficiently and privately identify which of a set of disjoint counting queries is (approximately) the largest on a dataset D, and release its identity along with the corresponding noisy count.We sketch the proof of the following proposition in Appendix A.
Proposition 2 (Report Noisy Max).Let Q be a set of efficiently computable and sampleable disjoint counting queries over a domain X .Further suppose that for every x ∈ X , the query q ∈ Q for which q(x) = 1 (if one exists) can be identified efficiently.For every n ∈ N and ε > 0 there is an mechanism F : 3. For every dataset D ∈ X n , let q OPT = argmax q∈Q q(D) and OPT = q OPT (D).
We are now ready to describe our unbounded algorithm M unb k as Algorithm 1.We prove Proposition 1 via the following two claims, capturing the privacy and utility guarantees of M unb k , respectively.Proof.If f vk,ρ (D) < 9n/10 for every vk and ρ, then the utility of the mechanism is always 1. Therefore, it suffices to consider the case when there exist vk, ρ for which f vk,ρ (D) ≥ 9n/10.When such vk and ρ exist, observe that we have f vk ,ρ (D) ≤ n/10 for every other pair (vk , ρ ) = (vk, ρ).Thus, as long as the Report Noisy Max algorithm successfully identifies the correct vk, ρ in Step 1 with probability all but β (Proposition 2).Moreover, the reported value a is at least 7n/10.By the perfect completeness of the zap proof system, the algorithm produces a useful triple (vk, ρ, π) in Step 4. Thus, the mechanism as a whole is (1 − β)-useful.

A SIM-CDP Algorithm
We define a PPT algorithm M CDP k in Algorithm 2, which we argue is an efficient, SIM-CDP algorithm achieving high utility with respect to u.

The only difference between M CDP
follows from the witness indistinguishability of the zap proof system.
The proof of Lemma 2 also shows that M k is useful for u.

Infeasibility of Differential Privacy
We now show that any efficient algorithm achieving high utility cannot be differentially private.In fact, like many prior hardness results, we provide an attack A that does more than violate differential privacy.Specifically we exhibit a distribution on datasets such that, given any useful answer produced by an efficient mechanism, A can with high probability recover a row of the input dataset.Following [DNR + 09], we work with the following notion of a re-identifiable dataset distribution.
Definition 8 (Re-identifiable Dataset Distribution).Let u : X n × R → {0, 1} be a utility function.Let {D k } k∈N be an ensemble of distributions over (D 0 , z) ∈ X n(k)+1 ×{0, 1} poly(k) for n(k) = poly(k).(Think of D 0 as a dataset on n+1 rows, and z as a string of auxiliary information about D 0 ).Let (D, D , i, z) ← Dk denote a sample from the following experiment: Sample (D 0 = (x 1 , . . ., x n+1 ), z) ← D k and i ∈ [n] uniformly at random.Let D ∈ X n consist of the first n rows of D 0 , and let D be the dataset obtained by replacing x i in D with x n+1 .
We say the ensemble {D k } k∈N is a re-identifiable dataset distribution with respect to u if there exists a (possibly inefficient) adversary A and a negligible function negl(•) such that for all polynomial-time mechanisms M k , 1. Whenever M k is useful, A recovers a row of D from M k (D).That is, for any PPT M k : 2. A cannot recover the row x i not contained in D from M k (D ).That is, for any algorithm M k : where x i is the i-th row of D.
Construction of a Re-identifiable Dataset Distribution.For k ∈ N, recall that the digital signature scheme induces a choice of verification key space K k , message space M k , and signature space S k , each on poly(k)-bit strings.Let n = poly(k).Define a distribution {D k } k∈N as follows.To sample (D 0 , z) from D k , first sample a key pair (sk, vk) ← Gen(1 k ).Sample messages m 1 , . . ., m n+1 ← M k uniformly at random.Then let σ i ← Sign(sk, m i ) for each i = 1, . . ., n + 1.Let the dataset D 0 = (x 1 , . . ., x n+1 ) where x i = (vk, m i , σ i , ρ), and set the auxiliary string z = (vk, ρ).
Proposition 4. The distribution {D k } k∈N defined above is re-identifiable with respect to the utility function u.
We break the proof of re-identifiability into two lemmas.First, we show that A can successfully recover a row in D from any useful answer: Hence, by the extractability of the zap proof system, we have that (m, σ) = E(1 kc , vk, ρ, π) satisfies (vk, (m, σ)) ∈ R L ; namely Ver(vk, m, σ) = 1 with overwhelming probability over the choice of ρ.
Re-identifiability of the distribution Dk follows by combining Lemmas 5 and 6.

Limits of CDP in the Client-Server Model
We revisit the techniques of [GKY11] to exhibit a setting in which efficient CDP mechanisms cannot do much better than information-theoretically differentially private mechanisms.In particular, we consider computational tasks with output in some discrete space (or which can be reduced to some discrete space) R k , and with utility measured via functions of the form g : R k × R k → R. We show that if (R k , g) forms a metric space with O(log k)-doubling dimension (and other properties described in detail later), then CDP mechanisms can be efficiently transformed into differentially private ones.In particular, when R k = R d for d = O(log k) and utility is measured by an L p -norm, we can transform a CDP mechanism into a differentially private one.
The result in this section is incomparable to that of [GKY11].We incur a constant-factor blowup in error, rather than a negligible additive increase as in [GKY11].However, in the case that utility is measured by an L p norm, our result applies to output spaces of dimension that grow logarithmically in the security parameter k, whereas the result of [GKY11] only applies to outputs of constant dimension.In addition, we handle IND-CDP directly, while [GKY11] prove their results for SIM-CDP, and then extend them to IND-CDP by applying a reduction of [MPRV09].

Task and Utility
Consider a computational task with discrete output space R k .Let g : R k ×R k → R be a metric on R k .We impose the following additional technical conditions on the metric space (R k , g): Definition 9 (Property L).A metric space formed by a discrete set R k and a metric g has property L if 1.The doubling dimension of (R k , g) is O(log k).That is, for every a ∈ R k and radius r > 0, the ball B(a, r) centered at a with radius r is contained in a union of poly(k) balls of radius r/2. 2. The metric space is uniform.Namely, for any fixed radius r, the size of a ball of radius r is independent of its center.3. Given a center a ∈ R k and a radius r > 0, the membership in the ball B(a, r) can be checked in time poly(k).4. Given a center a ∈ R k and a radius r > 0, a uniformly random point in B(a, r) can be sampled in time poly(k).
Given a metric g, we can define a utility function measuring the accuracy of a mechanism with respect to g: Definition 10 (α-accuracy).Consider a dataset space X k .Let q k : X n k → R k be any function on datasets of size n.Let M k : X n k → N d k be a mechanism for approximating q k .We say that M k is α k -accurate for q k with respect to g if with overwhelming probability, the error of M k as measured by g is at most α k .Namely, there exists a negligible function negl(•) such that We take the failure probability here to be negligible primarily for aesthetic reasons.In general, taking the failure probability to be β k will yield in our result below a mechanism that is (ε k , β k + negl(k))-differentially private.
Moreover, for reasonable queries q k , taking the failure probability to be negligible is essentially without loss of generality.We can reduce the failure probability of a mechanism M k from constant to negligible by repeating the mechanism O(log 2 k) times and taking a median.By composition theorems for differential privacy, this incurs a cost of at most O(log 2 k) in the privacy parameters.But we can compensate for this loss in privacy by first increasing the sample size n by a factor of O(log 2 k), and then applying a "secrecy-of-the-sample" argument [KLN + 11] -running the original mechanism on a random subsample of the larger dataset.This step maintains accuracy as long as the query q k generalizes from random subsamples.

Result and Proof
Theorem 5. Let (R k , g) be a metric space with property L. Suppose M k : X n k → R k is an efficient ε k -IND-CDP mechanism that is α k -accurate for some function q k with respect to g.Then there exists an efficient (ε, negl(k))-differentially private mechanism Mk that is O(α k )-accurate for q k with respect to g.
Proof.We denote a ball centered at a with radius r in the metric space (R k , g) by B(a, r) = {x ∈ R k : g(a, x) ≤ r}.
We also let V (r) def = |B(a, r)| for any a ∈ R k , which is well-defined due to the uniformity of the metric space.Now we define a mechanism Mk which outputs a uniformly random point from B(M k (x), c k ), where c k > 0 is a parameter be determined later.Note that Mk can be implemented efficiently due to the efficient sampling condition of property L. Since g satisfies the triangle inequality, Mk is (α k + c k )-accurate.Thus it remains to prove that Mk is (ε, negl(k))-DP.
The key observation is that, for every D ∈ X n k and s ∈ R k , For all sets S ⊆ R k , we thus have (by the above observation and α k -accuracy of M k ) By the bounded doubling dimension of (R k , g), we can set L p -norm case.Many natural tasks can be captured by outputs in R d with utility measured by an L p norm (e.g.counting queries).Since we work with efficient mechanisms, we may assume that our mechanisms always have outputs represented by poly(k) bits of precision.The level of precision is unimportant, so we may assume an output space represented by k bits of precision for simplicity.By rescaling, we may assume all responses are integers and take values in , the doubling dimension of the new discrete metric space induced by the L p -norm on integral points is O(log k) ( [GKL03] shows that the subspace of R d equipped with L p norm has doubling dimension O(d)).Now the metric space almost satisfies property L, with the exception of the uniformity condition.This is because the sizes of balls close the the boundary of N k are smaller than those in the interior.However, we can apply Theorem 5 to first construct a statistically DP mechanism with outputs in the larger uniform metric space N d .Then we may construct the final statistical mechanism Mk , by projecting answers that are not in N d k to the closest point in N d k .By postprocessing, the modified mechanism Mk is still differentially private.Moreover, its utility is only improved since Mk can only get closer to the true query answer in every coordinate.Therefore, we have the following corollary.
that is α k -accurate for some function q k when error is measured by an L p -norm.Then there exists an efficient (ε, negl(k))-differentially private mechanism Mk that is O(α k )-accurate for q k .The proof of Proposition 2 relies on the existence of an efficient sanitizer for the disjoint query class Q.Such a sanitizer appears in [Vad16], and is based on ideas of [BV16].(There, it is stated for the specific class of point functions, but immediately extends to disjoint counting queries).
Proposition 5 ([Vad16, Theorem 7.1]).Let Q be a set of efficiently computable and sampleable disjoint counting queries over a domain X .Suppose that for every element x ∈ X , the query q ∈ Q for which q(x) = 1 (if one exists) can be identified in time polylog(|X|).Let β > 0. Then there exists an algorithm San running in time poly(n, log |X|, 1/ε) for which the following holds.For any database D ∈ X n , with probability at least 1 − β, the algorithm San produces a "synthetic database" D ∈ X m such that for every q ∈ Q.

Proof (Proof of Proposition 2).
Consider the algorithm F which first runs the algorithm San on its input dataset to obtain a synthetic dataset D, and then outputs the pair (q, n m q( D)) where q = argmax q∈Q q( D).The algorithm F inherits efficiency and differential privacy from San.To see that it useful, suppose San indeed produces a database D ∈ X m for which for every q ∈ Q.Let q OPT = argmax q∈Q q(D), and γ = 8(log |Q| + log(1/β))/ε.Then n m q( D) ≥ n m q OPT ( D) ≥ q OPT (D)−γ/2.Moreover, suppose q OPT (D)−γ > max q =qOPT q(D).Then for any q = q OPT , we have Hence q ( D) < q( D) for every q = q OPT , and hence q = q OPT .

B.1 Non-Interactive Zero Knowledge Proofs
Most known constructions of zaps, as defined in Definition 5, are based on constructions of non-interactive zero knowledge proofs or arguments in the common reference string model.We review the requirements of such proof systems below.
Definition 11 (NIZK Proofs and Arguments).Let R L = {(x, w)} be a witness-relation corresponding to a language L ∈ NP.A non-interactive zeroknowledge proof (or argument) system for R L consists of a triple of algorithms (Gen, P, V ) where: -The generator Gen is a PPT that takes as input a security parameter k and statement length t = poly(k), and produces a common reference string crs.
An important special case is where Gen(1 k , 1 t ) outputs a uniformly random string, in which case we say the proof (or argument) system operates in the common random string model.-The prover P is a PPT that takes as input a crs and a pair (x, w) and outputs a proof π. -The verifier V is an efficient, deterministic algorithm that takes as input a crs, an instance x and proof π, and outputs a bit in {0, 1}.
Various security requirements we can impose on the proof system are: Perfect completeness.An honest prover who possesses a valid witness can always convince an honest verifier.Formally, for all (x, w) ∈ R L , Statistical soundness.It is statistically impossible to convince an honest verifier of the validity of a false statement.There exists a negligible function negl(•) such that for every sequence {x k } k∈N of poly(k)-size statements Computational zero-knowledge.Proofs do not reveal anything to the verifier beyond their validity.Formally, a proof system is computational zeroknowledge if there exists a PPT simulator (S 1 , S 2 ) where S 1 produces a simulated common reference string crs with associated trapdoor τ .The pair (crs, τ ) allows S 2 to simulate accepting proofs without knowledge of a witness w.That is, there exists a negligible function negl such that for all (possibly cheating) PPT verifiers V * and sequences {(x k , w k )} k∈N of poly(k)-size statementwitness pairs Statistical knowledge extraction.A proof system is additionally a proof of knowledge if a witness can be extracted from a valid proof.That is, there exists a polynomial-time knowledge extractor E = (E 1 , E 2 ) such that E 1 produces a simulated common reference string crs with associated extraction key ξ, which we assume to have length O(k). 1 The pair (crs, ξ) allows the deterministic algorithm E 2 to extract a witness from a proof.Formally, the first component of (crs, ξ) ← E 1 (1 k , 1 |x| ) is identically distributed to crs ← Gen(1 k , 1 |x| ).Moreover, there exists a negligible function negl such that for every x ∈ {0, 1} poly(k) , Pr crs←Gen(1 k ,1 |x| ) [∃ξ ∈ {0, 1} * , π ∈ {0, 1} * , w ∈ E 2 (crs, ξ, x, π) : (crs, ξ) ∈ E 1 (1 k , 1 |x| ) ∧ (x, w) / ∈ R L ∧ V (1 k , x, π) = 1 ≤ negl(k).
For technical reasons, we also require that the relation {(crs, ξ) ∈ E 1 (1 k , 1 |x| )} be recognizable in polynomial-time, which will always be the case for our constructions.

B.2 Extractability of Zaps Based on Exponentially Extractable NIZKs
We next describe Dwork and Naor's original construction of zaps [DN07].Here, we show that extractable zaps can be based on the existence of NIZK proofs of knowledge in the common random string model, which can in turn be built from various number theoretic assumptions [DP92, DDP00, GOS12].(Recall that in the common random string model for NIZK proofs, the crs generation algorithm simply outputs a uniformly random string.)The discussion in this section can be summarized by the following theorem.
Theorem 6.Let R L be a witness relation for a language L ∈ NP.Then R L has an extractable zap proof system if: There exists a non-interactive zero-knowledge proof of knowledge for R L (in the common random string model) with perfect completeness, statistical soundness, computational zero-knowledge, and statistical extractability.
The existence of such proofs of knowledge for NP can be based on any of the following assumptions: 1.The existence of NIZK proofs of membership for NP and "dense secure public-key encryption schemes" [DP92].NIZK proofs of membership can in turn be constructed from trapdoor permutations [FLS99] or indistinguishability obfuscation and one-way functions [BP15].Dense secure public-key encryption schemes can be constructed under the hardness of factoring Blum integers [DDP00] or the Diffie-Hellman assumption [DP92].2. The decisional linear assumption for groups equipped with a bilinear map [GOS12].
The remainder of this section is devoted to the proof of Theorem 6.Let R L be a witness relation for a language L ∈ NP.Let (P NIZK , V NIZK ) be a NIZK proof system in the common random string model.We now describe Dwork and Naor's [DN07] zap proof system for R L based on (P NIZK , V NIZK ).
For simplicity, assume we are interested in proving statements x having length which is a fixed polynomial in k.Let = (k) be a fixed polynomial.(This depends on the length of x and on the soundness error of the NIZK proof system.We defer discussion of its value to the proof of Proposition 6, where it will also depend on the knowledge error of the NIZK knowledge extractor E 2 .)The verifier's first message is a string ρ ∈ {0, 1} •m , which should be interpreted as a sequence of random strings ρ 1 , . . ., ρ each in {0, 1} m .Here, m = poly(k) is the length of the crs used in the proof system (P NIZK , V NIZK ).The prover and verifier algorithms appear as Algorithms 4 and 5 respectively.).Suppose (P NIZK , V NIZK ) is a perfectly complete and statistically sound NIZK proof system for R L in the common random string model.Then (P, V ) is a perfectly complete, statistically sound zap proof system for R L .
Our goal now is to show that if (P NIZK , V NIZK ) is also a statistically sound proof of knowledge, then the zap proof system (P, V ) is extractable in the sense of Definition 6. Proposition 6. If, in addition, (P NIZK , V NIZK ) is statistically knowledge extractable, then (P, V ) is also an extractable zap for R L .
we model an adversary {A k } k∈N as a sequence of polynomial-size circuits A k : R k → {0, 1}.Equivalently, {A k } k∈N can be thought of as a probabilistic polynomial time Turing machine with non-uniform advice.Definition 1 (Differential Privacy [DMNS06, DKM + 06]).A mechanism M is (ε, δ)-differentially private if for all adjacent datasets D ∼ D and every set S ⊆ Range(M ), Pr[M (D) ∈ S] ≤ e ε Pr[M (D ) ∈ S] + δ

k
and the inefficient algorithm M unb k occurs in Step 3, where we have replaced the inefficient process of finding a canonical message-signature pair (m * , σ * ) with selecting a message-signature pair (m i , σ i ) in the dataset.Since all the other steps (Report Noisy Max and the zap prover's algorithm) are efficient, M CDP k runs in polynomial time.However, this change renders M CDP k statistically non-differentially private, since a (computationally unbounded) adversary could reverse engineer the proof π produced in Step 4 to recover the pair (m i , σ i ) contained in the dataset.On the other hand, the witness indistinguishability of the proof system implies that M CDP k is nevertheless computationally differentially private:Lemma 3. The algorithm M CDP k is ε-SIM-CDP provided that n ≥ (20/ε) • (k + log |K k | + log |P k |) = poly(k, 1/ε).Proof.Indeed, we will show that M k = M unb k is secure as the simulator forM k = M CDPk .That is, we will show that for any poly(k)-size adversary A, that Pr[A(M CDP k (D)) = 1] − Pr[A(M unb k (D)) = 1] ≤ negl(k).First observe that by definition, the first two steps of the mechanisms are identical.Now define, for either mechanism M unb k or M CDP k , a "bad" event B where the mechanism in Step 1 produces a pair ((vk, ρ), a) for which f vk,ρ (D) = 0, but does not output (⊥, ⊥, ⊥) in Step 2. For either mechanism, the probability of the bad event B is negl(k), as long as n ≥ (20/ε) • (k + log(|K k | • |P k |)).This follows from the utility guarantee of the Report Noisy Max algorithm (Proposition 2), setting β = 2 −k .Thus, it suffices to show that for any fixing of the coins of both mechanisms in Steps 1 and 2 in which B does not occur, that the mechanisms M CDP k (D) and M unb k (D) are indistinguishable.There are now two cases to consider based on the coin tosses in Steps 1 and 2: Case 1: Both mechanisms output (⊥, ⊥, ⊥) in Step 2. In this case, Pr[A(M CDP k (D)) = 1] = Pr[A(⊥, ⊥, ⊥) = 1] = Pr[A(M unb k (D)) = 1], and the mechanisms are perfectly indistinguishable.