Almost Optimal Explicit Johnson-Lindenstrauss Families

. The Johnson-Lindenstrauss lemma is a fundamental result in probability with several applications in the design and analysis of algorithms. Constructions of linear embeddings satisfying the Johnson-Lindenstrauss property necessarily involve randomness and much attention has been given to obtain explicit constructions minimizing the number of random bits used. In this work we give explicit constructions with an almost optimal use of randomness: For 0 < ε, δ < 1 / 2, we obtain explicit generators G : { 0 , 1 } r → R s × d for s = O (log(1 /δ ) /ε 2 ) such that for all d -dimensional vectors w of Euclidean norm 1, Pr y ∈ u { 0 , 1 } r [ |(cid:107) G ( y ) w (cid:107) 2 − 1 | > ε ] ≤ δ, with seed-length r = O (cid:16) log d + log(1 /δ ) · log (cid:16) log(1 /δ ) ε (cid:17)(cid:17) . In particular, for δ = 1 / poly( d ) and ﬁxed ε > 0, we obtain seed-length O ((log d )(log log d )). Previous constructions required Ω (log 2 d ) random bits to obtain polynomially small error. We also give a new elementary proof of the optimality of the JL lemma showing a lower bound of Ω (log(1 /δ ) /ε 2 ) on the embedding dimension. Previously, Jayram and Woodruﬀ [9] used communication complexity techniques to show a similar bound.

We say a family of random matrices has the JL property (or is a JL family) if the above condition holds.In typical applications of JLL, the error δ is taken to be 1/ poly(d) and the goal is to embed a given set of poly(d) points in d dimensions to O(log d) dimensions with distortion at most 1 + ε for a fixed constant ε.This is the setting we concern ourselves with.Linear embeddings of Euclidean space as above necessarily require randomness as else one can take the vector w to be in the kernel of the fixed transformation.To formalize this we use the following definition.

Derandomizing JLL
A simple probabilistic argument shows that there exists a (d, O(log(1/δ)/ε2 ), δ, ε)-JL generator with seed-length r = O(log d + log(1/δ)).On the other hand, despite much attention the best known explicit generators have seed-length at least min( Ω(log(1/δ) log d), Ω(log d + log 2 (1/δ)) ) [5], [11].Besides being a natural problem in geometry as well as derandomization, an explicit JL generator with minimal randomness would likely help derandomize other geometric algorithms and metric embedding constructions.Further, having an explicit construction is of fundamental importance for streaming algorithms as storing the entire matrix (as opposed to the randomness required to generate the matrix) is often too expensive in the streaming context.
Our main result is an explicit generator that takes roughly O((log d)(log log d)) random bits and outputs a matrix A ∈ R s×d satisfying the JL property for constant ε and δ = 1/ poly(d).
Theorem 2 (Main).For every 0 < ε, δ < 1/2, there exists an explicit (d, We give two different constructions.Our constructions are elementary in nature using only standard tools in derandomization such as k-wise independence and oblivious samplers [15].Our first construction is simpler and gives a generic template for derandomizing most known JL families.The second construction has the advantage of allowing fast matrix-vector multiplications: the matrix-vector product G(y)w can be computed efficiently in time O(d log d) + poly(log(1/δ)/ε) 2 .
Further, as one of the motivations for derandomizing JLL is its potential applications in streaming, it is important that the entries of the generated matrices be computable in small space.We observe that for any i ∈ [s], j ∈ [d], y ∈ {0, 1} r , the entry G(y) ij can be computed in space O(log d • poly(log log d)) and time O(d 1+o (1) ) (for fixed ε, δ > 1/ poly(d)).(See proof of Theorem 8 for the exact bound.)

Optimality of JLL
We also give a new proof of the optimality of the JL lemma showing a lowerbound of s opt = Ω(log(1/δ)/ε 2 ) for the target dimension.Previously, Jayram and Woodruff [9] used communication complexity techniques to show a similar bound in the case s opt < d 1−γ for some fixed constant γ > 0. In contrast, our argument is more direct in nature and is based on linear algebra and elementary properties of the uniform distribution on the sphere, and only requires the assumption s opt < d/2.Note the JLL is only interesting for s opt < d.Theorem 3.There exists a universal constant c > 0, such that for any distribution A over linear transformations from R d to R s with s < d/2, there exists a vector
We also note that there are efficient non-black box derandomizations of JLL, [7], [14].These works take as input n points in R d , and deterministically compute an embedding (that depends on the input set) into R O(log n)/ε 2 which preserves all pairwise distances between the given set of n points.

Outline of Constructions
For intuition, suppose that δ > 1/d c is polynomially small and ε is a constant.Our constructions are based on a simple iterative scheme: We reduce the dimension from and iterate for O(log log d) steps.
Generic Construction.Our first construction gives a generic template for reducing the randomness required in standard JL families and is based on the following simple observation.Starting with any JL family, such as the random sign matrix construction of Theorem 1, there is a trade-off that we can make between the amount of independence required to generate the matrix and the final embedding dimension.For instance, if we only desire to embed to a dimension of O( √ d) (as opposed to O(log d)), it suffices for the entries of the random sign matrix to be O(1)-wise independent.We exploit this idea by iteratively decreasing the dimension from d to O( √ d) and so on by using a random sign matrix with an increasing amount of independence at each iteration.
Fast JL Construction.Fix a vector w ∈ R d with w = 1 and suppose δ = 1/ poly(d).We first use an idea of Ailon and Chazelle [2] who give a family of unitary transformations R from R d to R d such that for every w ∈ R d and V ∈ u R, the vector V w is regular, in the sense that V w ∞ = O( (log d)/d), with high probability.We derandomize their construction using limited independence to get a family of rotations R such that for with high probability, for a sufficiently small constant α > 0.
We next observe that for a vector coordinates preserves the 2 norm with distortion at most ε with high probability.We then note that the random set of coordinates can be chosen using oblivious samplers as in [15].The idea of using samplers is due to Karnin et al. [11] who use samplers for a similar purpose.
Finally, iterating the above scheme O(log log d) times we obtain an embedding of R d to R poly(log d) using O(log d log log d) random bits.We then apply the result of Clarkson and Woodruff [5] and perform the final embedding into O(log(1/δ)/ε 2 ) dimensions by using a random scaled sign matrix with O(log(1/δ))-wise independent entries.
As all of the matrices involved in the construction are either Hadamard matrices or projection operators, the final embedding can actually be computed in Outline of Lowerbound.To show a lowerbound on the embedding dimension s, we use Yao's min-max principle to first transform the problem to that of finding a hard distribution on R d , such that no single linear transformation can embed a random vector drawn from the distribution well with very high probability.We then show that the uniform distribution over the d-dimensional sphere is one such hard distribution.The proof of the last fact involves elementary linear algebra and some direct calculations.

Preliminaries
We first state the classical Khintchine-Kahane inequalities (cf.[12]) which give tight moment bounds for linear forms.
We use randomness efficient oblivious samplers due to Zuckerman [15] (See Theorem 3.17 and the remark following the theorem in [15] ).
Theorem 4 (Zuckerman [15]).There exists a constant C such that for every ε, δ > 0 there exists an explicit collection of subsets of and there exists an NC algorithm that generates random elements of S using O(log d + log(1/δ)) random bits.
Corollary 1.There exists a constant C such that the for every ε, δ, B > 0 there exists an explicit collection of subsets of and there exists an NC algorithm that generates random elements of S using O(log d + log(1/δ)) random bits.
Proof.Apply the above theorem to f : The following definitions will be useful in giving an abstract description of our constructions.Theorem 1 shows the conditions for being a strong (d, s)-JL distribution are met by random Bernoulli matrices when 0 < ε ≤ 1, though in fact the conditions are also met for all ε > 0 (see the proof in [5] for example).Sometimes we omit the d, s terms in the notation above if these quantities are clear from context, or if it is not important to specify them.
Throughout, we let logarithms be base-2 and often assume various quantities, like 1/ε or 1/δ, are powers of 2; this is without loss of generality.

Strong JL Distributions
It is not hard to show that having the strong JL moment property and being a strong JL distribution are equivalent.We use the following standard fact.

The claim follows by setting
Since D is a strong JL distribution, the right tail of Z is big-Oh of that of the absolute value of the nonnegative random variable Y which is the sum of a Gaussian with mean 0 and variance O(1/s), and an exponential random variable with parameter s.Now, apply Fact 5.
Remark 1. Theorem 6 implies that any strong JL distribution can be derandomized using 2 log(1/δ)-wise independence giving an alternate proof of the derandomized JL result of Clarkson and Woodruff (Theorem 2.2 in [5]).This is because, by Markov's inequality with even, and for ε < 1, (3.1) Setting = log(1/δ) and s = C /ε 2 for C > 0 sufficiently large makes the above probability at most δ.Now, note the th moment is determined by 2wise independence of the entries of S.

A Generic JL Derandomization Template
Theorem 6 and Remark 1 provide the key insight for our construction.If we use = 2 log(1/δ)-wise independent Bernoulli entries as suggested in Remark 1, the seed length would be O( log d) = O(log(1/δ) log d) for s = Θ(ε −2 log(1/δ)).However, note that in Eq. (3.1), a trade-off can be made between the amount of independence needed and the final embedding dimension without changing the error probability.In particular, it suffices to use 4-wise independence if we embed into s = Ω(ε −2 δ −1 ) dimensions.In general, if s = Cε −2 q for log 2 (1/δ) ≤ q ≤ 1/δ, it suffices to set = O(log q (1/δ)) to make the right hand side of Eq. (3.1) at most δ.By gradually reducing the dimension over the course of several iterations, using higher independence in each iteration, we obtain shorter seed length.
Our main construction is described in Figure 1.We first embed into O(ε −2 δ −1 ) dimension using 4-wise independence.We then iteratively project from O(ε ) dimensions until we have finally embedded into O(ε −2 log 2 (1/δ)) dimensions.In our final step, we embed into the optimal target dimension using 2 log(1/δ)-wise independence.Note the Bernoulli distribution is not special here; we could use any family of strong JL distributions.Proof.For a fixed vector w, let w i = S i • • • S 0 w, and let w −1 denote w.Then by our choice of s i and a Markov bound on the i th moment, As a corollary, we obtain our main theorem, Theorem 2.
Proof (of Theorem 2).We let the distributions in Steps 3 and 4 of Figure 1  The seed length required to generate S 0 is O(log d).For S i for i > 0 the seed length is . Thus, the total seed length is dominated by generating S 0 and S final , giving the claim.The distorion and error probabilities can be bounded by a union bound.

Explicit JL Families via Samplers
We now give an alternate construction of an explicit JL family.The construction is similar in spirit to that of the previous section and has the additional property that matrix-vector products for matrices output by the generator can be computed in time roughly O(d log d + s3 ), as it is based on the Fast Johnson-Lindenstrauss Transform (FJLT) of [2].For clarity, we concentrate on the case of δ = Θ(1/d c ) polynomially small.The case of general δ can be handled similarly with some minor technical issues 3 that we skip in this extended abstract.Further, we assume that log(1/δ)/ε 2 < d as else JLL is not interesting.
As outlined in the introduction, we first give a family of rotations to regularize vectors in R d .For a vector x ∈ R d , let D(x) ∈ R d×d be the diagonal matrix with . By Markov's inequality and the Khintchine-Kahane inequality (Lemma 1), The claim now follows from a union bound over i ∈ [d].
We now give a family of transformations for reducing d dimensions to Õ(d 1/2 )• poly(s opt ) dimensions using oblivious samplers.For S ⊆ [d], let P S : R d → R |S| be the projection onto the coordinates in S. In the following let C be the universal constant from Corollary 1.
Proof.Let v = HD(x)w.Then, v = 1 and by Lemma 2 applied for α = 1/4C, and The claim now follows.
We now recursively apply the above lemma.Let k 0 = 8C(c + 1) (recall that δ = 1/d c ) and k i+1 = 2 i k 0 .The parameters d i , k i are chosen so that 1/d ki i is always polynomially small.Fix t > 0 to be chosen later so that k i < d Proof.The proof is by induction on i = 1, . . ., t.For i = 1, the claim is same as Lemma 3. Suppose the statement is true for i − 1 and let Then, v ∈ R di and the lemma follows by Lemma 3 applied to A(d i , k i ), and v.
What follows is a series of elementary calculations to bound the seed-length and error from the above lemma.Observe that where we assumed that log log d > 2c + 2. Therefore, the error in Lemma 4 can be bounded by Note that, Therefore, the randomness needed after t = O(log log d) iterations is Combining the above arguments (applied to δ = δ/ log log d and ε = ε/ log log d and simplifying the resulting expression for seed-length) we obtain our fast derandomized JL family.Proof.We suppose that δ = Θ(1/d c ) -the analysis for the general case is similar.From the above arguments there is an explicit generator that takes O(log(d/δ)• log( log(d/δ)/ε )) random bits and outputs a linear transformation A : R d → R m for m = poly(log(d/δ), 1/ε), satisfying the JL property with error at most δ and distortion at most ε.The theorem now follows by composing the transformations of the above theorem with a sign matrix having 2 log(1/δ)-wise independent entries.The additional randomness required is O(log(1/δ) log m) = O(log(1/δ)(log log(d/δ) + log(1/ε)).
We next bound the time for computing matrix-vector products for the matrices we output.Note that for i < t, the matrices A i of Lemma 4 are of the form P S • H di D(x) for a k-wise independent string x ∈ {1, −1} di .Thus, for any vector w i ∈ R di , A i w i can be computed in time O(d i log d i ) using the discrete Fourier transform.Therefore, for any w = w 0 ∈ R n0 , the product A t−1 • • • A 1 A 0 w 0 can be computed in time

Theorem 6 .
A distribution D is a strong (d, s)-JL distribution if and only if it has the strong (d, s)-JL moment property.Proof.First assume D has the strong JL moment property.Then, for arbitrary w
be strong JL distributions.Then Steps 3 and 4 are satisfied by Remark 1.[

Lemma 3 .
Let S ≡ S(d, d 1/2C , ε, δ), s = O(d 1/2 log C (1/δ)/ε C ) be as in Corollary 1 and let D be a k-wise independent distribution over {1, −1} d .For S ∈ u S, x ← D, define the random linear transformation A S,x : R d → R s by A S,x = d/s • P S • HD(x).Then, for every w ∈ R d with w = 1, Fix ε, δ > 0. Let A(d, k) : R d → R s(d) be the collection of transformations {A S,x : S ∈ u S, x ← D} as in the above lemma for s(d) = s(d, d 1/2C , ε, δ) = c 1 d 1/2 (log d/ε) C , for a constant c 1 .Note that we can sample from A(d, k) using r(d, k) = k log d + O(log d + log(1/δ)) = O(k log d) random bits.Let d 0 = d, and let d i+1 = s(d i ).
i log d i ) ≤ O(d log d) + log d • t−1 i=1 O d 1/2 i (log(1/δ)/ε 2 ) 2 (Equation 5.1) = O(d log d + √ d log d log 2 (1/δ)/ε 4 ).The above bound dominates the time required to perform the final embedding.A similar calculation shows that for indices i ∈ s, j ∈ [d], the entry G(y) ij of the generated matrix can be computed in space O ( i log d i ) = O(log d + log(1/ε) • log log d) by expanding the product of matrices and enumerating over all intermediary indices 4 .The time required to perform the calculation is O(s • d t • d t−1 • • • d 1 ) = d • (log d/ε) O(log log d) .
1/ √ d} d×d be the normalized Hadamard matrix such that H T d H d = I d (we drop the suffix d when dimension is clear from context).While the Hadamard matrix is known to exist for powers of 2, for clarity, we ignore this technicality and assume that it exists for all d.Finally, let S d−1 denote the Euclidean sphere {w : w