Characterizing pseudoentropy and simplifying pseudorandom generator constructions

We provide a characterization of pseudoentropy in terms of hardness of sampling: Let (X,B) be jointly distributed random variables such that B takes values in a polynomial-sized set. We show that B is computationally indistinguishable from a random variable of higher Shannon entropy given X if and only if there is no probabilistic polynomial-time S such that (X,S(X)) has small KL divergence from (X,B). This can be viewed as an analogue of the Impagliazzo Hardcore Theorem (FOCS '95) for Shannon entropy (rather than min-entropy).
 Using this characterization, we show that if f is a one-way function, then (f(Un),Un) has "next-bit pseudoentropy" at least n+log n, establishing a conjecture of Haitner, Reingold, and Vadhan (STOC '10). Plugging this into the construction of Haitner et al., this yields a simpler construction of pseudorandom generators from one-way functions. In particular, the construction only performs hashing once, and only needs the hash functions that are randomness extractors (e.g. universal hash functions) rather than needing them to support "local list-decoding" (as in the Goldreich--Levin hardcore predicate, STOC '89).
 With an additional idea, we also show how to improve the seed length of the pseudorandom generator to ~{O}(n3), compared to O(n4) in the construction of Haitner et al.


INTRODUCTION
Computational analogues of information-theoretic notions have given rise to some of the most interesting phenomena in complexity and cryptography. For example, computational indistinguishability [GM2], which is the computational analogue of statistical distance, enabled bypassing Shannon's impossibility results on perfectly secure encryption [Sha], and provided the basis for the computational theory of pseudorandomness [BM,Yao1].
Computational analogues of entropy were introduced by Yao [Yao1] and Håstad, Impagliazzo, Levin, and Luby [HILL]. The Håstad et al. notion, known as pseudoentropy, was key to their fundamental result establishing the equivalence of pseudorandom generators and one-way functions, and has also now become a basic concept in complexity theory and cryptography.
A more relaxed notion, called next-bit pseudoentropy, was recently introduced by Haitner, Reingold, and Vadhan [HRV], who used it to give a simpler and more efficient construction of pseudorandom generators from one-way functions. From a one-way function on n-bit strings, they construct a pseudorandom generator with seed lengthÕ(n 4 ), improving the bound ofÕ(n 8 ) from [HILL,Hol2].
In this work, we provide new characterizations of pseudoentropy and next-bit pseudoentropy, and use these to further simplify the construction of pseudorandom generators from one-way functions. In addition, we show how to save another factor of n in the seed length, yielding a pseudorandom generator with seed lengthÕ(n 3 ) from a one-way function on n bits.

Characterizing Pseudoentropy
The Håstad et al. notion of pseudoentropy is the following: Definition 1.1 (pseudoentropy [HILL], informal). A random variable X has pseudoentropy at least k if there exists a random variable Y such that: 1. X is computationally indistinguishable from Y .
2. H(Y ) ≥ k, where H(·) denotes Shannon entropy. 1 Pseudoentropy is interesting because a random variable can have much higher pseudoentropy than its Shannon entropy. Indeed, if G : {0, 1} n → {0, 1} m is a pseudorandom generator, then G(Un) has Shannon entropy at most n, but is indistinguishable from Um (by definition) and hence has pseudoentropy m > n. (Here and throughout, Un denotes a random variable uniformly distributed over {0, 1} n .) A useful generalization is the notion of conditional pseudoentropy, analogous to the notion of conditional pseudomin-entropy studied by Hsiao, Lu, and Reyzin [HLR]: Definition 1.2 (conditional pseudoentropy, informal). Let (X, B) be jointly distributed random variables. We say that B has (conditional) pseudoentropy at least k given X if there exists a random variable C, jointly distributed with X such that 1. (X, B) is computationally indistinguishable from (X, C).

H(C|X) ≥ k, where H(·|·) denotes conditional Shannon entropy. 2
Note that if B has pseudoentropy at least k given X, then (X, B) has pseudoentropy at least H(X)+k, but the converse is false (consider X that has pseudoentropy H(X) + k on its own, with a B that has no pseudoentropy).
Intuitively, a random variable B should have high pseudoentropy given X iff B is hard to predict from X, and indeed there are results of this type known in special cases involving pseudo-min-entropy (to be discussed later). Our main result is such a characterization for pseudoentropy (i.e. pseudo- Shannon-entropy).
Before getting to the formal statement, note that both pseudoentropy and unpredictability may occur for informationtheoretic reasons, as H(B|X) may be larger than 0. For example, suppose that B is a uniform random bit, independent of X. Then B has 1 bit of pseudoentropy given X and cannot be predicted better than random guessing from X, but these are not for computational reasons (i.e. they also hold for computationally unbounded algorithms). We would like to focus on the computational randomness in B. For pseudoentropy we can do this by simply subtracting H(B|X). For unpredictability, we do this by considering the feasibility of sampling the distribution B|X=x given a sample x ∼ X. Thus, in the example that B is a random bit independent of X, this sampling is easy to do (in contrast to the task of predicting B from X).
With these choices, we can indeed prove that pseudoentropy and hardness of sampling are equivalent: Theorem 1.3 (characterizing pseudoentropy, informal). Let (X, B) be jointly distributed random variables where B takes values in a polynomial-sized set. Then B has pseudoentropy at least H(B|X) + δ given X if and only if there is no probabilistic polynomial-time algorithm S such that the KL divergence from (X, B) to (X, S(X)) is at most δ. 3 KL divergence is a common information-theoretic measure of "distance" between random variables (though it is not a metric).
The constraint that B takes values in a polynomial-sized set is essential for this theorem: If f is a one-way permutation and X is a uniformly random output, then it is very hard to sample f −1 (X) given X, but the pseudoentropy of f −1 (X) given X is negligible (since we can efficiently recognize f −1 (X) given X). However, we do have an alternative version of our result that holds for B taking values in an exponentially large range (when considering nonuniform complexity). In that version, we replace the task of sampling a distribution S(X) from X with that of computing a "measure" that, when normalized to be a distribution, has small KL divergence from (X, B). In particular, this alternative formulation is interesting even when X is empty and gives a characterization of pseudoentropy of an arbitrary random variable B (with respect to nonuniform complexity).
To provide some more intuition for our theorem and the proof techniques, we compare it to previous results relating forms of pseudoentropy and unpredictability.
1. Yao [Yao2] showed that if B is a single bit, then (X, B) is indistinguishable from (X, U1) (i.e. B has pseudoentropy at least 1 given X) iff B cannot be predicted from X with probability noticeably more than 1/2. This can be generalized to B taking values in a polynomialsized alphabet Σ: B ∈ Σ has pseudoentropy log |Σ| given X iff B cannot be predicted with probability noticeably more than 1/|Σ|. Thus, in the extreme case of maximal pseudoentropy (equal to log |Σ|), we have an equivalence with unpredictability.
2. For B that takes values in larger (say exponentially large) alphabets, Goldreich and Levin [GL] showed that if B is very hard to predict from X (i.e. cannot be predicted with nonnegligible probability), then we can choose a random hash function H whose range is a polynomial-sized set Σ and it will hold that H(B) ∈ Σ has pseudoentropy log |Σ| given X and H. While this is very useful and has many applications, it does not characterize the pseudoentropy of B itself (but rather a hash of it), requires a hash function that supports "local list-decoding," and again only talks about maximal pseudoentropy (log |Σ|).
3. As noted in [STV], the Hardcore Theorem of Impagliazzo [Imp] (and subsequent strengthenings [KS,Hol1,BHK]) can be interpreted as relating unpredictability and a kind of pseudoentropy. Specifically, when B is a single bit, the Hardcore Theorem tells us that B cannot be predicted from X with probability greater than 1 − δ iff "B is indistinguishable from a random bit on a 2δ fraction of the probability space (X, B)" (this fraction of the probability space is typically called the "hardcore measure"). One formalization of the latter condition is to say that (X, B) is indistinguishable from (X, C) where C has average min-entropy [DORS] at least log(1/δ) given X. This result is of the same spirit as Theorem 1.3, but refers to average min-entropy rather than Shannon entropy, and does not distinguish between information-theoretic hardness and computational hardness.
In light of the above similarities, it is natural that the proof of Theorem 1.3 follows the same overall structure as existing proofs of the Hardcore Theorem when showing that the hardness of sampling B given X implies the pseudoentropy of B given X. Specifically, our proof for the case of nonuniform complexity (i.e. circuit size) has the same structure as Nisan's proof of the Hardcore Theorem [Imp]. We assume for contradiction that B does not have pseudoentropy H(B|X) + δ given X, i.e. B is distinguishable from every C such that H(C|X) ≥ H(B|X) + δ. Using the Min-Max Theorem, we deduce that there is a convex combination D of small circuits that is a universal distinguisher, i.e. Pr[D(X, B) = 1] − Pr[D(X, C) = 1] > for every C such that H(C|X) ≥ H(B|X) + δ. Next we show how to use such a D to sample a distribution S(X) at small KL divergence from B (given X). It turns out that we can do this by exponentiating D -we take S(X) to be such that Pr[S(X) = b] ∝ 2 kD (x,b) where k ∈ R is the largest number such that H(S(X)|X) ≥ H(B|X) + δ. In statistical physics C = S(X) is known as a Boltzmann distribution associated with D, and can be shown to minimize Pr[D(X, C) = 1] among all high-entropy C [LL]. Thus it is the "hardest" high-entropy distribution for D to distinguish from B. The proof that S(X) has small KL divergence from B uses a new information-theoretic lemma saying that if C is a random variable obtained from exponentiating D in this way, then the KL divergence from (X, B) to (X, C) can be expressed exactly in terms of D's advantage in distinguishing (X, B) and (X, C).
For the case of uniform complexity (namely, probabilistic polynomial-time algorithms), we replace the use of the Min-Max Theorem with a new Uniform Min-Max Theorem, which constructively builds a near-optimal strategy of the second player in a 2-player game from several best-responses of the second player to strategies of the first player. We defer a detailed discussion of the Uniform Min-Max Theorem and its other applications to a forthcoming paper [VZ1], but we include the proof of the Uniform Min-Max Theorem in our technical report [VZ2] for reference. We note that the proof of the Uniform Min-Max Theorem also uses ideas from the proof of the Uniform Hardcore Theorem due to Barak, Hardt, and Kale [BHK].

Next-Bit Pseudoentropy from One-Way Functions
The Håstad, Impagliazzo, Levin, and Luby [HILL] construction of pseudorandom generators from one-way functions begins by showing how to use a one-way function to construct an efficiently samplable distribution X whose pseudoentropy is noticeably larger than its Shannon entropy. This approach was refined by Haitner et al. [HRV] using the following variant of pseudoentropy: Definition 1.4 (next-block pseudoentropy [HRV], informal). A sequence of jointly distributed random variables (X1, . . ., Xm) has next-block pseudoentropy at least k iff there exist random variables (Y1, . . . , Ym), jointly distributed with (X1,. . .,Xm)  We say that a random variable X taking values in {0, 1} m has next-bit pseudoentropy at least k iff when we break X into 1-bit blocks, then X = (X1, . . . , Xm) has next-block pseudoentropy at least k.
Intuitively, next-bit pseudoentropy captures the pseudoentropy from the perspective of an adversary who gets the bits one at a time (from left to right), instead of all at once. Thus, the next-bit pseudoentropy of a random variable can be much larger than its pseudoentropy. For example, if G : {0, 1} n → {0, 1} m is a pseudorandom generator, then (G(Un), Un) has next-bit pseudoentropy at least m > n, but does not have pseudoentropy larger than n.
Haitner, Reingold, and Vadhan [HRV] showed that if f : {0, 1} n → {0, 1} m is a one-way function, X ∈R {0, 1} n , and H : {0, 1} n → {0, 1} n is a random hash function from an appropriate family, then (f (X), H, H(X)) has next-bit pseudoentropy n+r+log n, where r is the number of random bits used to describe the hash function H. The intuition for this is as follows: Condition on f (X) = y for some y ∈ {0, 1} n . Given that f (X) = y, X is uniformly distributed in a set of size |f −1 (y)|. Thus, by the Leftover Hash Lemma [HILL], the first ≈ log |f −1 (y)| bits of H(X) are statistically close to uniform given the prefix preceding them. In addition, it is still difficult to invert f and predict X given these bits (since a uniform random string can't help in inverting). Thus, by the Goldreich-Levin Theorem [GL], the next ≈ log n bits of H(X) are computationally indistinguishable from uniform given the preceding bits. Therefore the next-bit pseudoentropy of (f (X), H, H(X)) is at least [log |f −1 (y)|] + log n = H(f (X)) + r + H(X|f (X)) + log n = n + r + log n.
Haitner, Reingold, and Vadhan [HRV] conjectured that the hashing in the above construction is not necessary, and the hardness of inverting a one-way function directly provides (next-bit) pseudoentropy. We prove their conjecture: Theorem 1.5 (one-way functions ⇒ next-bit pseudoentropy). If f : {0, 1} n → {0, 1} m is a one-way function and X ∈R {0, 1} n , then (f (X), X) has next-bit pseudoentropy at least n + log n.
The proof of this theorem starts by showing that the onewayness of f implies that for every probabilistic polynomialtime algorithm A, the KL divergence from (f (X), X) to (f (X), A(f (X))) is at least log n; otherwise A would invert f with nonnegligible probability. Then we show that the same holds also in a "next-bit" sense: if we break X into bits X = X1 · · · Xn and choose I ∈R [n], then for every probabilistic polynomial-time S, the KL divergence from (f (X), X1, . . . , XI ) to (f (X), X1,. . . , XI−1, S(f (X), X1, . . ., XI−1)) is at least (log n)/n. (Otherwise by iteratively applying S n times, we can obtain a probabilistic polynomial-time A such that (f (X), A(f (X))) has KL divergence at most log n from (f (X), X).) By Theorem 1.3, we deduce that XI has pseudoentropy at least H(XI |f (X), X1, . . . , XI−1) + (log n)/n given f (X), X1, . . . , XI−1. That is, on average, the individual bits of X have (log n)/n extra bits of pseudoentropy (beyond their Shannon entropy) given f (X) and the previous bits of X. Summing over all n bits of X, the next-bit pseudoentropy is at least log n bits larger than the Shannon entropy of (f (X), X), which is n.

Pseudorandom Generators
Given the next-bit pseudoentropy generator (f (X), X) ∈ {0, 1} m+n of Theorem 1.5, we can apply the construction of Haitner et al. [HRV] to obtain a pseudorandom generator through the following three steps: • Entropy Equalization: To spread the pseudoentropy out evenly among the bits, we concatenate u =Θ(n) independent random evaluations of (f (X), X), then drop the first I bits and the last m + n − I bits of the u · (n + m)-bit long result, for I ∈R [m + n].
• Converting Shannon Entropy to Min-Entropy and Amplifying the Gap: Next, we take t =Θ(n 2 ) copies of the above next-bit pseudoentropy generator (after entropy equalization), but concatenate them "vertically" to obtain blocks, each of which consists of t bits. It can be shown that each of the blocks is indistinguishable from having high min-entropy conditioned on the previous ones.
• Randomness Extraction: Finally, we use a single random universal hash function to extract the pseudomin-entropy from each of the blocks, and concatenate the results to produce our output.
Thus, to obtain a pseudorandom generator from a one-way function f , we simply need to evaluate f on u · t =Õ(n 3 ) random inputs, arrange the input and output bits into a matrix consisting of (u − 1) · (m + n) columns and t rows, and apply a universal hash function to each column. (The seed of the pseudorandom generator consists of the u · t inputs to f , the t random shifts used for entropy equalization, and the description of the universal hash function.) The construction is illustrated in Figure 1. Note that we only need to hash once in the construction and the only property we need of our hash function is randomness extraction (e.g. via the Leftover Hash Lemma). In contrast, all previous constructions of pseudorandom generators from oneway functions (even from one-way permutations) required hash functions with "local list-decoding" properties (e.g. the Goldreich-Levin hardcore predicate) in addition to randomness extraction. As pointed out to us by Yuval Ishai, an advantage of using only universal hash functions is that they can be implemented by linear-size boolean circuits [IKOS], and thus we can obtain PRGs computable by circuits of size linear in their stretch (from one-way functions that are computable by linear-size circuits but exponentially hard to invert). Such PRGs have applications to "cryptography with constant computational overhead".
While simpler, the aforementioned construction achieves essentially the same parameters as [HRV]. Using an additional idea, we show how to save a factor of roughly u =Θ(n) in the seed length. The idea is that to extract the randomness from a column of the aforementioned matrix, we do not need to construct the entire matrix. We can use just enough seed to fill a single column, and then we can use randomness extracted from that column to help generate more columns, and iterate. (This idea is independent of our simplifications above, and can also be applied to the construction based on the [HRV] pseudoentropy generator.) Thus we show: This theorem improves the seed length of O(u · t · n) = O(n 4 ) from Haitner et al. [HRV]. We note that Haitner et al. gave a nonuniform construction of seed lengthÕ(n 3 ), requiring poly(n) bits of nonuniform advice to compute the pseudorandom generator. (Entropy equalization can be avoided by nonuniformly hardwiring the amount of entropy contributed by each bit.) Also, our construction still requires evaluating the one-way function at least u · t =Θ(n 3 ) times; we just no longer need these evaluations to be independent. Finally, like [HRV], the construction obtains Θ(log n) bits of additive stretch per invocation of the one-way function, which is optimal by [GGKT].
With Theorem 1.6, now the only blow-up in seed length in constructing pseudorandom generators from one-way functions is due to converting Shannon entropy to min-entropy. It is an intriguing open problem whether that blow-up can be avoided or shown to be necessary.

Relation to Inaccessible Entropy
A variety of computational notions of entropy have been studied in the cryptography and complexity literature, e.g. [Yao2,HILL,BSW,HLR,HRVW,HRV,HHR + ,FR,Rey]. In addition to the notions discussed in Sections 1.1 and 1.2, our work was also inspired by the works on inaccessible entropy [HRVW, HHR + ].
Like our characterization of conditional pseudoentropy, inaccessible entropy refers to a difficulty of sampling a random variable B from a jointly distributed random variable X. However, there are important differences. In our characterization (Theorem 1.3), the sample of X is generated externally and fed to the adversary, who tries then to sample the conditional distribution B|X. In the [HHR + ] notion of inac-cessible entropy, the adversary is also given the random coins used to generate X, and we compare its output distribution conditioned on those coins to B|X. And in the original notion of inaccessible entropy, from [HRVW], the adversary is the one who generates X (or some approximation to it). These three notions are analogous to the security conditions for one-way functions, target collision-resistant hash functions (i.e. UOWHFs), and collision-resistant hash functions, respectively (thinking of X = f (B) for B ∈R {0, 1} n ). We note that the hardness of sampling we consider also differs from inaccessible entropy in the way it measures how well an adversary approximates the conditional distribution B|X. Roughly speaking, in our notion (measuring the KL divergence from B|X to the adversary's output), the adversary's goal is to produce an output distribution that contains B|X as tightly as possible. In the notions of inaccessible entropy, the adversary's goal is to produce an output distribution that is contained within B|X as tightly as possible.
There is also significant similarity between our construction and those involving inaccessible entropy. In [HRVW], it is shown that if f is a one-way function, then (f (Un), Un) is a next-bit inaccessible entropy generator, just like we show that it is a next-bit pseudoentropy generator (Theorem 1.5). However, for inaccessible entropy, it is only necessary to break f (Un) into bits (Un can be treated as a single block), and for pseudoentropy it is only necessary to break Un into bits (f (Un) can be treated as a single block). Nevertheless, there are enough similarities to suggest that there may be a deeper connection between inaccessible entropy and pseudoentropy; trying to formalize this connection is an interesting question for future work.

Paper Organization
Basic notions of information theory and computational randomness are defined in Section 2. In Section 3 we describe and prove our characterization of pseudoentropy. In Section 4 we show how to generate next-bit pseudoentropy from any one-way function. In Section 5 we describe the PRG construction and how to save the seed length.

Entropy
Shannon entropy plays a central role in this paper. For more background on entropy and proofs of the lemmas stated here, see [CT]. .
For jointly distributed random variables X and B, the conditional (Shannon)  The notion of KL divergence from random variable A to random variable B is closely related to Shannon entropy; intuitively it measures how dense A is within B, on average (with 0 divergence representing maximum density, i.e. A = B, and large divergence meaning that A is concentrated in a small portion of B).
Thus, conditional KL divergence captures the expected KL divergence from A|X=x to B|Y =x, over x ∼ X. Like Shannon entropy, it has a chain rule: Like other distance measures between distributions, applying any (deterministic) function never increases the KL divergence: Note however, that the KL divergence is not a metric; it is not symmetric and does not satisfy the triangle inequality.

Pseudorandom Generators
First, we define the computational analogue of two random variables being statistically close: A pseudorandom generator is an algorithm that stretches a short uniformly random string to a longer pseudorandom string, one which looks random even to algorithms more powerful than the generator itself: We say G is a pseudorandom generator if G is a (n c , 1/n c ) pseudorandom generator for every constant c. The input to a pseudorandom generator is called the seed. The number of extra bits, − d, is called the stretch.
While the notions of indistinguishability and pseudorandom generators here are defined for uniform algorithms, nonuniform indistinguishability and nonuniform pseudorandomness can be defined by replacing time T algorithms with size T boolean circuits.

Pseudoentropy and Next-Bit Pseudoentropy
The computational analogue of entropy, pseudoentropy, was first introduced by Hastad et al. [HILL]. We begin with the nonuniform definition because it is simpler: Definition 2.8 (pseudoentropy, nonuniform setting). Let X be a random variable. We say X has (T, ) nonuniform pseudoentropy at least k if there exists a random variable Y with H(Y ) ≥ k such that X and Y are (T, ) nonuniformly indistinguishable.
If X = X(n) for a security parameter n, we say X has nonuniform pseudoentropy at least k = k(n) if for every constant c, X(n) has (n c , 1/n c ) nonuniform pseudoentropy at least k(n) − 1/n c for all sufficiently large n.
A natural generalization of pseudoentropy is the notion of conditional pseudoentropy. Definition 2.9 (conditional pseudoentropy, nonuniform setting). Let B be a random variable jointly distributed with X. We say B has (T, ) nonuniform (conditional) pseudoentropy at least k (or pseudoentropy gap at least k − H(B|X)) given X if there exists a random variable C jointly distributed with X such that the following holds: If B = B(n) for a security parameter n, we say B has nonuniform (conditional) pseudoentropy at least k = k(n) given X if for every constant c, B(n) has (n c , 1/n c ) nonuniform (conditional) pseudoentropy at least k(n) − 1/n c given X(n) for all sufficiently large n.
In the uniform setting (i.e. randomized algorithms instead of circuits), the right definitions are more subtle to come by. It turns out that we must require indistinguishability even against algorithms equipped with an sampling oracle. (See remark below for more discussion.) Notation. For a distribution Z, let OZ denote the oracle that gives a random sample from Z when queried.
Definition 2.10 (pseudoentropy, uniform setting). Let n be a security parameter, T = t(n), = (n), k = k(n), q = q(n). Let X be a [q]-valued random variable. We say X has (T, ) uniform pseudoentropy at least k if for all time T randomized oracle algorithm A there exists a random variable Y jointly distributed 5 with X such that the following holds for all sufficiently large n: We say X has uniform pseudoentropy at least k = k(n) if for every constant c, X(n) has (n c , 1/n c ) uniform pseudoentropy at least k(n) − 1/n c .
The reason to give the distinguishers oracle access to OX,Y is to ensure that the definition composes: if X1 and X2 are iid copies of X, we'd like to say that (X1, X2) has pseudoentropy at least 2k. Indeed we'd want to say that (X1, X2) is indistinguishable from (Y1, Y2) where Y1, Y2 are iid copies of Y . However, indistinguishability against uniform algorithms is not preserved under taking multiple independent samples in general [GM1]. Requiring indistinguishability against distinguishers with oracle access to OX,Y ensures that indistinguishability will be preserved under taking multiple independent samples.
Definition 2.11 (conditional pseudoentropy, uniform setting). Let n be a security parameter, We say B has (T, ) uniform (conditional) pseudoentropy at least k given X if for every randomized oracle algorithm A computable in time T , there is a random variable C jointly distributed with X, B such that the following holds for all sufficiently large n: We say B has uniform (conditional) pseudoentropy at least We give the distinguishers oracle access to OX,B,C for the same reason as we give oracle access to OX,Y in Definition 2.10. However, a consequence of our results is that the definition with oracle OX,B,C is equivalent to the definition with oracle OX,B provided B comes from a polynomial-sized alphabet. In particular, if (X, B) is also polynomial-time samplable (which will be the case in our applications), the definition is equivalent to one without oracle OX,B,C . (See Corollary 3.23.) Finally, it is useful to talk about the total conditional pseudoentropy of a sequence of random variables, called the next-block pseudoentropy: Definition 2.12 (next-block pseudoentropy). Let n be a security parameter, k = k(n), and B (i) be a random variable for each i = 1, . . . , m = m(n). We say It is easy to see that next-bit pseudoentropy is a weaker notion than pseudoentropy. Therefore we would like "blocks" to be small, ideally bits, to increase the next-block pseudoentropy. Note that the next-bit pseudoentropy is sensitive to the order of the bits; for example, for any one-way function f , (Un, f (Un)) does not have next-bit pseudoentropy n + 1, but (f (Un), Un) has next-bit pseudoentropy at least n + Ω(log n) as we show in Section 3.

CHARACTERIZING PSEUDOENTROPY
In this section, we show that a random variable B having pseudoentropy given X, is equivalent to B being KL-hard given X, which roughly captures the hardness of generating B from X in terms of KL divergence. We prove the equivalence in both nonuniform and uniform models of computation.
To state the mains results precisely, we begin with basic conventions and definitions. We will work with random variables taking values in [q], which are jointly distributed with a {0, 1} n -valued random variable X. For any [q]-valued random variable C jointly distributed with X, we write C(a|x) = Pr[C = a|X = x]. We will drop "jointly distributed with X" when it is clear from the context. Such a jointly distributed r.v. C can be algorithmically represented in two ways: (i) By a randomized algorithm S that samples C from X, i.e. C = S(X); (ii) By an algorithm P that computes the (conditional) probability mass function (pmf) of C, i.e. P (x, a) = Pr[C = a|X = x]. In general, having an efficient algorithm for one representation does not imply having an efficient algorithm for the other (under some complexity assumptions) [KMR + , Nao]. But when the alphabet size q is small, approximating the pmf of C given X (say to within ± ) is equivalent to approximately sampling C given X (say to within statistical distance ), up to a factor of poly(q, 1/ ) in running time. (See Lemmas 3.6, 3.7 below.) A drawback of the pmf representation is that it can be infeasible to maintain the normalization ∑ a P (x, a) = 1 when manipulating the random variable if the alphabet size q is large. Thus it is convenient to work with measures instead of pmf. A function P : {0, 1} n × [q] → (0, +∞) is called a (conditional) measure of the random variable CP defined as follows: .
Thus a measure is just some scalar multiple of the pmf. In this section, we generalize the pmf representation so that P only has to compute some (conditional) measure of C.
where we view P as a distribution over functions p : We say B is nonuniformly δ KL-hard given X if for every constant c, B is nonuniformly (n c , δ − 1/n c ) KL-hard given X for all sufficiently large n.
Analogously to pseudoentropy, the nonuniform and uniform definitions differ in whether we need to give a sampling oracle to the adversary. Definition 3.3 (KL-hard, uniform setting). Let n be a security parameter, δ = δ(n) > 0, t = t(n) ∈ N, q = q(n).
Let (X, B) be a {0, 1} n × [q]-valued random variable. We say B is uniformly (t, δ) KL-hard given X if for all time t randomized oracle algorithms P : {0, 1} n × [q] → (0, +∞) and all sufficiently large n, P O X,B is not a δ-KL predictor of B given X (where the randomness of P O X,B consists both of its internal coin tosses and the samples it gets from the oracle OX,B).
We say B is uniformly δ KL-hard given X if for every constant c, B is uniformly (n c , δ − 1/n c ) KL-hard given X.
Note that by letting P (x, a) = 1, we already get C = U [q] i.e. KL(X, B||X, C) = log q − H(B|X) ≤ log q. Thus it only makes sense to talk about KL-hardness for δ ≤ log q.
The following related definition may be more natural, as a closer parallel to the familiar notion of average-case hardness: Definition 3.5 (KL-hard for sampling, uniform setting). Let n be a security parameter, These two notions are equivalent up to a polynomial factor in t, provided that size of the alphabet q is a polynomial: Proof. Suppose B is not nonuniformly (t , δ) KL-hard given X. That is, there exists a size t circuit P such that KL(X, B ||X, CP ) ≤ δ. Then we can sample S(x) = a w.p. CP (a|x) so that KL(X, B||X, S(X)) ≤ δ. S has circuit size O(q · t ). This contradicts the fact that B is nonuniformly (t, δ) KLhard for sampling, for t = Ω(t/q).
Conversely, suppose KL(X, B||X, S(X)) ≤ δ − for some size t circuit S. We will construct a size t randomized δ-KL predictor P (so that it will be useful for the uniform setting, Lemma 3.7, as well) as follows. We compute where c is a large enough constant. This is done by taking m = O (n + log q + log(1/γ)) · q 2 / 4 samples of the randomness of S. We then output P (x, a) = max{E(x, a), /cq} ∈ ( /cq, 1].
We view P as a distribution over functions p : Thus we get KL(X, B||X, Cp) = KL(X, B||X, S(X)) On the other hand, for every p : Thus, for an appropriate choice of γ = O( /(log q + log(1/ ))). Furthermore, P has circuit size O (t m) = t. Thus B is not nonuniformly (t, δ) KL-hard given X.
Proof. The proof for the second part is identical to Lemma 3.6. For the first part, suppose B is not uniformly (t , δ) KLhard given X. That is, there is a time t oracle algorithm P such that when P O X,B is viewed as a distribution over functions p : {0, 1} n × [q] → (0, +∞), for infinitely many n, where we first pick p ∼ P O X,B by fixing the internal coin tosses of P and samples from oracle OX,B. By convexity of KL(X, B||X, ·), This contradicts the fact that B is uniformly (t, δ) KL-hard for sampling, for t = Ω(t/(q + n)).
In this section, it is more convenient to work with the first version of KL-hardness (i.e. not for sampling). We show the following main results which establish equivalence between (conditional) pseudoentropy and KL-hardness in both nonuniform and uniform settings. By dropping X, the polylog(q) dependence gives us a characterization of nonuniform pseudoentropy for an n-bit random variables: (Note that without conditioning on X, the definition of KL-hard still makes sense, expressing the hardness of computing a measure that approximates the distribution B.)

Corollary 3.10. An n-bit random variable B has nonuniform pseudoentropy at least H(B) + δ if and only if B is nonuniformly δ KL-hard.
We now state the uniform versions of our results, which are analogous to the nonuniform versions but have a polynomial dependence on q (we do not know whether it can be made polylogarithmic like in Theorem 3.8, so we don't have a uniform analogue of Corollary 3.10.) Theorem 3.11 (Main Theorem, uniform setting). Let n be a security parameter, δ = δ(n) > 0, t = t(n) ∈ N, = (n) > 0, q = q(n), σ = σ(n) all computable in time poly(n). Let (X, B) be a {0, 1} n × [q]-valued random variable.
1. If B is uniformly (t, δ) KL-hard given X, then B has uniform (t , ) pseudoentropy at least H(B|X) + δ − given X, for t = t Ω(1) /poly(n, q, 1/ ). Note that we do not make any samplability assumption on X (in both nonuniform and uniform settings).

Conversely
Distinguishers are a central object in studying pseudoentropy. A distinguisher D is a {0, 1}-valued randomized function, and D(x) denotes the probability that the function outputs 1 on input x ∈ {0, 1} * . A generalized distinguisher D is a R + -valued randomized function, and D(x) denotes the expectation of the output on input x. For generalized distinguishers D1 and D2, the scalar multiple kD1 (k ≥ 0) and the sum D1 + D2 are also generalized distinguishers.
A generalized distinguisher D is said to have distinguishing advantage AdvD(X, Y ) = E [D(X)] − E [D(Y )] between random variables X, Y . Thus for random variables (X, B), (X, C): A key idea in our argument is to analyze the random variable 2 D for a generalized distinguisher D, defined as . This is a conditional version of the Boltzmann distribution (or Gibbs distribution; canonical ensemble) in statistical physics [LL], which is the unique distribution that achieves maximum entropy under a linear constraint on the pmf. We consider the conditional Boltzmann distribution in our context for a similar reason: for any distinguisher D, it turns out that C = 2 kD (k ≥ 0) minimizes AdvD((X, B), (X, C)) among all C with H(C|X) ≥ r = H(2 kD |X). (The unconditional version is well known in statistical physics [LL]. We give a simple proof for the conditional version in Lemma 3.18). Thus a lower bound on AdvD((X, B), (X, C)) for all C with H(C|X) ≥ r is equivalent to a lower bound for C = 2 kD .
In particular, we are able to relate AdvD((X, B), (X, 2 D )) to the KL divergence from (X, B) to (X, 2 D ) and the entropies of these random variables by the following key lemma: Proof. We note that with D(x, a) ≡ 0, this becomes the familiar KL(X, B||X, U [q] ) = log q − H(B|X). To quickly see why this lemma is useful: suppose D has good performance distinguishing 2 D from B, then we can use 2 D to predict B within small KL divergence; this is essentially the idea why KL-hardness implies pseudoentropy, at least in the nonuniform setting (Part 1 of Theorem 3.8).

KL-hardness Implies Pseudoentropy, Nonuniform Setting
We begin with the main technical ingredient of pseudoentropy implying KL-hardness. This lemma says that a universal distinguisher D -one that distinguishes B from all high-entropy C's -can be used to approximate B to within small KL divergence.
To prove Part 1 of Theorem 3.8, we use the Min-Max Theorem to get a universal distinguisher from the assumption that B has low conditional pseudoentropy, and then apply Lemma 3.14.
Proof. Suppose for contradiction that B does not have nonuniform (t , ) conditional pseudoentropy at least H(B|X)+δ− . By definition, for any [q]-valued random variable C with H(C|X) ≥ H(B|X) + δ − , there is a size t distinguisher D between (X, B) and (X, C), with AdvD((X, B)(X, C)) > .
Consider the following two player zero-sum game. Player 1 picks a [q]-valued random variable C with H(C|X) ≥ H(B|X) + δ − . Player 2 picks a size t distinguisher D. The payoff for Player 2 is AdvD((X, B)(X, C)).
Player 1 has no mixed strategy to force Player 2 to achieve payoff at most , because a convex combination of random variables with conditional entropy at least H(B|X) + δ − also has conditional entropy at least H(B|X) + δ − . So, by the Min-Max Theorem, Player 2 has a mixed strategy that achieves expected payoff greater than regardless of Player 1's move. Rephrasing, there is a convex combination D of size t distinguishers that is a universal distinguisher, in the sense that AdvD((X, B), (X, C)) > for all C with H(C|X) ≥ H(B|X) + δ − . By Lemma 3.14, there exists k ∈ [0, (log q)/ ] such that KL(X, B||X, 2 kD ) ≤ δ− . In other words, P (x, a) = 2 kD(x,a) satisfies KL(X, B||X, CP ) ≤ δ − . SincekD is rational valued, we can use Newton's method to construct a circuitP approximating 2kD. This can be done in such a way that KL (X, B||X, CP ) ≤ KL(X, B||X, 2 kD ) + ≤ δ andP has size t = poly (t , n, 1/ , log q). See Lemma A.3 for details. This contradicts the hypothesis that B is nonuniformly (t, δ) KL-hard given X.

KL-hardness Implies Pseudoentropy, Uniform Setting
To prove the uniform complexity version of Theorem 3.15, we replace the use of the Min-Max Theorem in the proof of Theorem 3.15 with a Uniform Min-Max Theorem from our forthcoming paper [VZ1]. The Uniform Min-Max Theorem constructively builds a near-optimal strategy of the first player in a 2-player game from several best-responses of the first player to strategies of the second player. M * is called the KL projection of N on C.
A nice property of KL projection is the following geometric structure (see [CT], Chap 11, Section 6):  Assuming KL(M * ||N ) is finite, then Pythagorean theorem implies the KL projection M * is unique: for any M ∈ C which is also a KL projection, the theorem implies KL(M ||M * ) = 0, which holds only when M = M * .
Finding the exact KL projection is often computationally imfeasible, so we consider approximate KL projection. We say M * is a σ-approximate KL projection of N on C, if M * ∈ C and for all M ∈ C, In our context, let Cr denote the set of distributions (X, C) over {0, 1} n × [q] for all C with H(C|X) ≥ r. We state here the Uniform Min-Max Theorem specialized to the case where the strategies for Player 2 are distinguishers: ) ← an arbitrary σ-approximate KL projection of (X, C (i) ) on Cr end Let D * compute the average of D (1) , . . . , D (S) Algorithm 1: Finding Universal Distinguisher The proof of Theorem 3.17 can be found in our technical report [VZ2]. To implement Algorithm Finding Universal Distinguisher, in particular, we need to compute σapproximate KL projections on the conditional entropy ball Cr.

Approximate KL Projection on the Conditional Entropy Ball
In this section we describe how to efficiently find (X, C) as a σ-approximate KL projection of (X, C ) on Cr. We first describe the exact KL projection of random variable (X, C) on a conditional entropy ball Cr, then show how to approximate it.
Recall that for a generalized distinguisher D : {0, 1} n × [q] → R + , k ∈ R, and a {0, 1} n -valued random variable X, we define a [q]-valued random variable 2 kD (jointly distributed with X) as follows: We begin by showing that C = 2 kD (k ≥ 0) minimizes AdvD((X, B), (X, C)) among all C with H(C|X) ≥ H(2 kD |X). As mentioned above, 2 kD is a conditional version of the Boltzmann distribution in statistical physics [LL], for which a similar property is well known. While this was our motivation to consider the random variable 2 kD , we did not explicitly need it for the nonuniform theorem (Theorem 3.15). But why are distinguishers relevant at all, when all we want is to KL-project an arbitrary (X, C) on some entropy ball? The reason is that when viewing C as 2 D for some generalized distinguisher D, Lemma 3.13 says we can minimize KL by maximizing the distinguishing advantage, assuming that the entropy difference is fixed. This will be clear in the proof of Lemma 3.19 below.

Lemma 3.18. For every
Proof. Consider any C where H(C|X) ≥ H(2 kD |X). If k = 0, then H(2 kD |X) = log q, so C and 2 kD must both be uniform on [q] given X and the result holds vacuously. Thus assume k > 0. By Lemma 3.13, where we use nonnegativity of KL divergence. Thus, ) ≤ 0, as desired.
Proof. One can readily verify that D is a generalized distinguisher and C = 2 D . Moreover, if (X, C) ∈ Cr then the KL projection is (X, C) = (X, 2 D ) itself, i.e. α = 1.
Hence WLOG assume we have found such β. Closeness of H β to both r and H(2 βD |X) ensures that r ≤ H(2 βD |X) ≤ r + σ.

Putting it Together
We now have all the tools ready to prove Theorem 3.11 (KL hardness implies pseudoentropy, uniform setting). We just will replace the use of the Min-Max Theorem in the proof of Theorem 3.15 with the Uniform Min-Max Theorem for distinguishers (Theorem 3.17), using Lemma 3.20 to implement the approximate KL projection. However, notice that H(B|X) hence the "radius" of the conditional entropy ball Cr is unknown. We will simply try all radii (with quantization) and pick the distinguisher that results in the best KL predictor, which can be tested by sampling (X, B).
Proof. Suppose for contradiction that B does not have uniform (t , ) conditional pseudoentropy at least H(B|X)+δ− . By definition, there is a time t randomized oracle algorithm D such that for infinitely many n and every C with H(C|X) ≥ H(B|X) + δ − , D O X,B,C -distinguishes (X, B) and (X, C).
Let Cr denote the entropy ball {(X, C) : H(C|X) ≥ r}. Let γ > 0 be an error parameter to be fixed later. Assume that given any r ≥ H(B|X) + δ − /2, we can implement Algorithm Finding Universal Distinguisher on C = Cr using oracle OX,B, to output a circuit D * of size poly(t , n, log q, 1/ , log(1/γ)) w.p. at least 1−γ, in time poly(t , n, q, 1/ , log(1/γ)). We show how to do in the end.
Let c be a large enough constant. We show that the following time t oracle algorithm P violates the hypothesis that B is uniformly (t, δ) KL-hard given X: for an appropriate choice of γ = Ω( /(log q + 1/ )), as desired.
Let γ > 0 be an error parameter to be fixed later. For each iteration j ∈ [S], we will implement C (j) in Algorithm Finding Universal Distinguisher by constructing a generalized distinguisher Dj as a circuit of size poly(t , n, log q, 1/ , log(1/γ )) such that C (j) = 2 D j . We do this for j = 1 by setting D1 = 0. Assuming we have constructed Dj, we can construct Dj+1 in time poly(t , n, q, 1/ ) as follows: 1. We can obtain a size t = poly(t , n, log q, 1/ , log(1/γ )) distinguisher D (j) from Dj such that in time poly(t , n, q, 1/ ) w.p. at least 1 − 2γ , where c is the constant in Algorithm Finding Universal Distinguisher. By using Newton's method to approximate 2 D j , we can construct a circuitP such that the random variablẽ C(a|x) =P (x, a)/ ∑ bP (x, b) satisfies (i) H(C|X) ≥ H(C (j) |X) − /2; (ii) For any distinguisher D , Adv D ((X, B), (X, C (j) )) ≥ Adv D ((X, B), (X,C))− /3. This can be done in time poly(t , n, log q, 1/ , log(1/γ )) w.p. at least 1 − γ (See Lemma A.3). We then generate m = O((log(1/γ ) + n + log q)/ 2 ) random samples of (X, B,C) t and U t , whereC is samplable from X in time poly(t , n, q, 1/ , log(1/γ )). Finally let D (j) be the distinguisher that given (x, a), chooses I ∈R [m] and outputs D O X,B,C (x, a) using the Ith copy of (X, B,C) t to answer oracle queries and the Ith copy of U t as the internal randomness of D.
Note that the size of D (j) does not depend on the size of Dj (but the size of Dj+1 will additively depend on the size of D (j) ). By a Chernoff bound and union bound, w.p. at least 1 − γ for every (x, a) we have
In the uniform case, this implication holds even for a weaker definition of conditional pseudoentropy where we only require indistinguishability against distinguishers with oracle access to OX,B.
Proof. We shall prove the nonuniform version. Once so it will be clear that the uniform version follows.
We claim the following distinguisher D is a desired universal distinguisher: Note that D is a distinguisher i.e. D(x, a) ∈ [0, 1], because 2 −t ≤ P (x, a) ≤ 2 t . Moreover, one can verify that 2 2t D = CP . Now consider any C with H(C|X) ≥ H(B|X) + δ. Applying Lemma 3.13 twice, we obtain where the inequality by definition of λ-KL predictor, as well as where the inequality is by nonnegativity of KL divergence. Taking the difference yields

Efficiency.
We approximate D byD, where log P (x, a) is computed to precision σ/2. Since P (x, a) is represented as a rational p1/p2 where p1, p2 ≤ 2 t , the logarithm can be approximated to that precision in time poly(t , log(1/σ)) using Taylor series. Thus D has circuit size poly(t , log(1/σ)) ≤ t. Moreover, for any C with H(C|X) ≥ H(B|X) + δ, we have This completes the proof for the nonuniform case. At this point, the uniform version also follows quite naturally: Given P such that when P O X,B is viewed as a distribution over functions p : We let D be the randomized oracle algorithm such that D O X,B performs the above conversion from a λ-KL predictor to a universal (δ − λ − σ/2)/2t -distinguisher, replacing the P (x, a) there with the output of simulating P O X,B on (x, a) (using random coin tosses and OX,B). Thus for every C with H(C|X) ≥ H(B|X) + δ, Furthermore, D runs in time poly(n, t , log(1/σ)) ≤ t.
Since Theorem 3.22 only requires a weaker version of conditional pseudoentropy, we obtain the following equivalence: Corollary 3.23. Let n be a security parameter, δ = δ(n) > 0, q = q(n) computable in time poly(n). Let (X, B) be a {0, 1} n × [q]-valued random variable that is polynomial-time samplable. Then the following are equivalent: 1. B is uniformly δ KL-hard given X; 2. B has uniform pseudoentropy at least H(B|X)+δ given X; 3. B has "weak" uniform pseudoentropy at least H(B|X)+ δ given X: For every probabilistic polynomial time algorithm A and every constant c, there is a random variable C jointly distributed with X, B such that the following holds for all sufficiently large n: • H(C|X) ≥ H(B|X) + δ − 1/n c ; • (X, B) and (X, C) are indistinguishable by A: Proof. 1 ⇒ 2 by Theorem 3.21. 2 ⇒ 3 by definition. 3 ⇒ 1 by Theorem 3.22 and the fact that (X, B) is polynomial-time samplable.

FROM ONE-WAY FUNCTIONS TO NEXT-BIT PSEUDOENTROPY
In this section, we show how to obtain a next-bit pseudoentropy generator from an arbitrary one-way function f . One-way functions are functions easy to compute but hard to invert: This section is structured as follows. Given a one-way function f , we first show that Un is KL-hard for sampling given f (Un). By a chain rule for KL-hardness, we then argue it is KL-hard to sample the next bit of Un given f (Un) and all previous bits of Un. Finally, we use the equivalences between KL-hardness for sampling, KL-hardness, and conditional pseudoentropy (for small q) to derive that (f (Un), Un) has a lot of total next-bit pseudoentropy.
Proof. Suppose for contradiction that Un is not uniformly (t , log(1/γ)) KL-hard for sampling given f (Un), i.e. there exists a time t randomized oracle algorithm S such that Let g(y, x) be the indicator function that f (x) = y. Since applying a (deterministic) function does not increase KL divergence (Lemma 2.5), where g(f (Un), Un) ≡ 1, and g ( (f (Un), S O f (Un),Un (f (Un)) ) equals 1 w.p. p = Pr [S O f (Un ),Un (f (Un)) = Un]. Since the KL divergence from Bernoulli(1) to Bernoulli(p) is log(1/p), we must have p ≥ γ. That is, Since O f (Un),Un can be simulated in time poly(n), this violates the fact that f is (t, γ) one-way for t = t · poly(n). ) ≤ δ n .
This violates Y being uniformly (t, δ) KL-hard for sampling given Z.
Remark 4.5. The argument in this section says (f (Un), Un) has a lot of next-bit pseudoentropy as long as Un is KL-hard to sample from f (Un). The KL-hardness of sampling Un from f (Un) is similar to the notion of a distributional oneway function [IL] which amounts to replacing KL divergence with statistical distance.
Clearly f is not one-way, but Un is still KL-hard to sample from f (Un). Thus, our construction of next-bit pseudoentropy generators (and later on, pseudorandom generators) can be based on a larger class of functions.

FROM NEXT-BIT PSEUDOENTROPY TO PSEUDORANDOMNESS
In this section, for brevity, we always assume the uniform setting whenever referring to one-way functions and computational notions of (conditional) entropy. Nonetheless, these results hold in the nonuniform setting too, with little or no change in the argument.

The Construction
Haitner et al. show a construction of a pseudorandom generator from any next-bit pseudoentropy generator G nb . Their result can be stated as follows: Theorem 5.1 (pseudorandomness from next-bit pseudoentropy [HRV]). Let n be a security parameter. Let ∆ = ∆(n) ∈ [1/poly(n), n], m = m(n), κ = κ(n) ∈ [n/2] be polynomial time computable. For every polynomial time computable G nb : {0, 1} n → {0, 1} m such that G nb (Un) has (T, ) next-bit pseudoentropy at least n + ∆, there exists a 3. Randomness extraction. This step is essentially a computational version of block source extraction. At the previous step, the amount of next-bit pseudo-min-entropy in each block is known. So we may choose hash functions of fixed output length to make the output pseudorandom.
Lemma 5.5. [HRV] Let n be a security parameter, where h is a random hash function from the family.
We refer to [HRV] for the proofs and detailed explanation of intuition behind these steps. The seed length blow up in [HRV] comes from Step 1 (Entropy Equalization) and Step 2 (Converting to conditional min-entropy), as each involves repeating the current generator on many independent seeds. We show how to save the blow up due to Entropy Equalization, by showing how randomness from a "few" copies of G nb can be used to generate more copies of G nb , and iteratively.
Specifically, we show that the [HRV] construction above, but taking only = 2 copies in Entropy Equalization, gives rise to a "Z-seeded" PRG, one that given input distribution Z outputs some (Z,Ũσ) indistinguishable from (Z, Uσ). (If Z were uniformly distributed in {0, 1} d , this would be a standard PRG.) Then we apply iterative composition (just like iterative composition for standard PRGs [Gol]) to increase the number of pseudorandom bits (without changing the seed distribution Z).
We begin by describing the iterative composition of Zseeded PRGs, illustrated in Figure 2. Proof. Consider the following algorithm G (z): If = 0 then output (the empty string). If ≥ 0 then let (z,ũ) = G(z) and output G −1 (z) •ũ.
We claim that G (Z) is pseudorandom, so we obtain the desired PRG G by composing G with algorithm that sam-ples Z given d random bits. Clearly G runs in poly(n) time. We show the pseudorandomness of G (Z) by a hybrid argument.
Suppose for contradiction that G (Z) is not (T , )-pseudorandom, i.e. there exists a T time -distinguisher D between G (Z) and U σ . For each 0 ≤ i ≤ define a hybrid distribution Hi = (Gi(Z), U ( −i)σ ). Thus H0 = U σ and H = G (Z). Let I ∈R [ ]. Then We use this to break the pseudorandomness property of We now show how to construct a Z-seeded PRG G from any next-bit pseudoentropy generator G nb , as demonstrated in Figure 3. By applying iterative composition, this gives rise to a seed-efficient construction of PRG from a pseudoentropy generator G nb which should be compared to the original construction illustrated in Figure 1. • G nb (U (1) ) 1,...,J (1) −1 . . . G nb (U (t) ) 1,...,J (t) −1 • G nb (U (t+1) ) . . . G nb (U (2t) )) G nb (U n ) . . .
. . . An arbitrary universal hash function H (with a proper output length) is then applied to all bits in the same column, producing pseudorandom bits (Ũ (1) , . . . ,Ũ (t) ,Ũ ) where each U (i) is of length n. We then apply G nb to eachŨ (i) . Together with unused bits of Z they formZ. We ignore H, J (1) , . . . , J (t) in the figure since they are the same in the input and output of G.
In the following, we will show that G(Z) = (H•W •G nb (Ũ (1) ) . . . G nb (Ũ (t) ) •Ũ ) is computationally indistinguishable from nb 's are iid copies of G nb (Un). The proof is essentially the same 3-step analysis as in Haitner et. al, with the tweak that the conditional pseudoentropy and conditional pseudo-min-entropy are now additionally conditioned on W , and the final indistinguishablility holds for W taking any value. In Step 1, we set Y (i) = G nb (U (t+i) ) J (i) ,...,m • G nb (U (i) ) 1,...,J (i) −1 .
Moreover, G is computable with O( d/n) (uniformly random) oracle calls to G nb .
Proof. By Theorem 5.7, there is a Z-seeded PRG G where Z is samplable in polynomial time from U d , and G (Z) is (T − n O(1) , n O(1) · ( + 2 −κ )) indistinguishable from (Z, U ). By Lemma 5.6 there exists a pseudorandom generator G with the above parameters.