The Unified Theory of Pseudorandomness

Pseudorandomness is the theory of efficiently generating objects that "look ran- dom" despite being constructed with little or no randomness. One of the achieve- ments of this research area has been the realization that a number of funda- mental and widely studied "pseudorandom" objects are all almost equivalent when viewed appropriately. These objects include pseudorandom generators, expander graphs, list-decodable error-correcting codes, averaging samplers, and hardness amplifiers. In this survey, we describe the connections between all of these objects, showing how they can all be cast within a single "list-decoding framework" that brings out both their similarities and differences.


Introduction
Pseudorandomness is the theory of efficiently generating objects that "look random" despite being constructed with little or no randomness.Over the past 25 years, it has developed into a substantial area of study, with significant implications for complexity theory, cryptography, algorithm design, combinatorics, and communications theory.One of the achievements of this line of work has been the realization that a number of fundamental and widely studied "pseudorandom" objects are all almost equivalent when viewed appropriately.These objects include: Pseudorandom Generators These are procedures that stretch a short "seed" of truly random bits into a long string of "pseudorandom" bits that cannot be distinguished from truly random by any efficient algorithm.In this article, we focus on methods for constructing pseudorandom generators from boolean functions of high circuit complexity.
Expander Graphs Expanders are graphs that are sparse but nevertheless highly connected.There are many variants of expander graphs, but here we focus on the classical notion of vertex expansion, where every subset of not-too-many vertices has many neighbors in the graph.
Error-Correcting Codes These are methods for encoding messages so that even if many of the symbols are corrupted, the original message can still be recovered.Here we focus on list decoding, where there are so many corruptions that uniquely decoding the original message is impossible, but it is still may be possible to produce a short list of possible candidates.
Randomness Extractors These are procedures that extract almost uniformly distributed bits from sources of biased and correlated bits.Here we focus on extractors for general sources, where all we assume is a lower bound on the amount of "entropy" in the source and only get a single sample from the source.Extractors for such sources necessarily use a small number of additional truly random bits as a "seed" for extraction.
Samplers These are randomness-efficient methods for sampling elements of a large universe so that approximately the correct fraction of samples will land in any subset of the universe with high probability.
Hardness Amplifiers These are methods for converting worst-case hard boolean functions into ones that are average-case hard.
These objects are all "pseudorandom" in the sense that a randomly chosen object can be shown to have the desired properties with high probability, and the main goal is typically to find explicit constructions -ones that are deterministic and computationally efficient -achieving similar parameters.Each of these objects was introduced with a different motivation, and originally developed its own body of research.However, as mentioned above, research in the theory of pseudorandomness has uncovered intimate connections between all of them.In recent years, a great deal of progress has been made in understanding and constructing each of these objects by translating intuitions and techniques developed for one to the others.
The purpose of this survey is to present the connections between these objects in a single place, using a single language.Hopefully, this will make the connections more readily accessible and usable for non-experts and those familiar with some but not all of the objects at hand.In addition, it is also meant to clarify the differences between the objects, and explain why occasional claims of "optimal" constructions of one type of object do not always lead to improved constructions of the others.
Naturally, describing connections between six different notions in a short article makes it impossible to do justice to any of the objects in its own.Thus, for motivation, constructions, and applications, the reader is referred to existing surveys focused on the individual objects [CRT,Kab,HLW,Sud,Gur,NT,Sha,Gol1,Tre2] or the broader treatments of pseudorandomness in [Mil,Tre3,Gol3,AB,Vad2].In particular, the monograph [Vad2] develops the subject in a way that emphasizes the connections described here.
The framework used in this survey extends to a number of other pseudorandom objects, such as "randomness condensers," but we omit these extensions due to space constraints.(See [Vad2].)

The Framework
As we will see, all of the objects we are discussing can be syntactically viewed as functions Γ : We will show how the defining properties of each of the objects can be cast in terms of the following notion.
, and an agreement parameter ε ∈ [0, 1), we define In general, it will be possible to characterize each of the pseudorandom objects by a condition of the following form: Here ε ∈ [0, 1] and K ∈ [0, N ] will be parameters corresponding to the "quality" of the object, and we usually wish to minimize both.C will be a class of subsets of [M ], sometimes governed by an additional "quality" parameter.Sometimes the requirement will be that the size of LIST Γ (T, ε) is strictly less than K, but this is just a matter of notation, amounting to replacing K in the above formulation by K − 1.
The notation "LIST Γ (•, •)" comes from the interpretation of list-decodable errorcorrecting codes in this framework (detailed in the next section), where T corresponds to a corrupted codeword and LIST Γ (T, ε) to the list of possible decodings.This list-decoding viewpoint turns out to be very useful for casting all of the objects in the same language.However, this is not the only way of looking at the objects, and indeed the power of the connections we describe in this survey comes from the variety of perspectives they provide.In particular, many of the connections were discovered through the study of randomness extractors, and extractors remain a powerful lens through which to view the area.The list-decoding view of extractors, and consequently of many of the other objects presented here, emerged through a sequence of works, and was crystallized in paper of Ta-Shma and Zuckerman [TZ].
Our notation (e.g. the parameters N ,M ,D, K, ε) follows the literature on extractors, and thus is nonstandard for some of the objects.We also follow the convention from the extractor literature that n = log N , d = log D, m = log M , and k = log K.While it is not necessary for the definitions to make sense, in some cases it is more natural to think of N , D, and/or M as a power of 2, and thus the sets [N ], [D], and [M ] as corresponding to the set of bit-strings of length n, d, and m, respectively.In some cases (namely, list-decodable codes and hardness amplifiers), we will restrict to functions in which y is a prefix of Γ(x, y), and then it will be convenient to denote the range by [D] × [q] rather than [M ].This syntactic constraint actually leads to natural variants (sometimes referred to as "strong" or "seed-extending" variants) of the other objects, too, but we do not impose it here for sake of generality and consistency with the most commonly used definitions.
Much of the work on the objects are discussing is concerned with giving explicit constructions, which correspond to the function Γ : being deterministically and efficiently computable, e.g. in time poly (n, d).However, since our focus is on the connections between the objects rather than their constructions, we will generally not discuss explicitness except in passing.

List-Decodable Codes
We begin by describing how the standard notion of list-decodable codes can be cast in the framework, because it motivates the notation LIST Γ (•, •) and provides a good basis for understanding the other objects.
A code is specified by an encoding function mapping n-bit messages to codewords consisting of D symbols over an alphabet of size q.More generally, it can be a function Enc : [N ] → [q] D .(In the coding literature, the message alphabet is usually taken to be the same as the codeword alphabet, which translates to a scaling of the message length by a factor of log q.In addition, the message length is usually denoted by k rather than n and the codeword length is n rather than D.) The goal is to define the function Enc so that if a codeword Enc(x) is corrupted in a significant number of symbols and one only receives the corrupted string r ∈ [q] D , the message x can still be recovered.List-decodable codes are designed for a setting where the number of corruptions is too large to hope for uniquely decoding x, and thus we settle for getting a short list of possible candidates.Definition 2. A code Enc : [N ] → [q] D is (ε, K) list-decodable if for every "received word" r ∈ [q] D , there are at most K messages x ∈ [N ] such that Enc(x) and r agree in greater than a 1/q + ε fraction of positions.
This definition says that if we receive a string r ∈ [q] D that we know has resulted from corrupting a codeword Enc(x) in less than a 1 − (1/q + ε) fraction of positions, then we can pin down the message x to one of at most K possibilities.K is thus called the list size.Note that we expect a uniformly random string r R ← [q] D to agree with most codewords in roughly a 1/q fraction of positions so we cannot expect to do any meaningful decoding from agreement 1/q; this is why we ask for agreement greater than 1/q + ε.
Naturally, one wants the agreement parameter ε to be as small possible and the (relative) rate ρ = log N/(D log q) of the code to be as large as possible.
In coding theory, one typically considers both ε and ρ to be fixed constants in (0, 1), while the message length n = log N tends to infinity and the alphabet size remains small (ideally, q = O(1)).The main challenge is to achieve an optimal tradeoff between the rate and agreement, while maintaining a list size K polynomially bounded in the message length n.Indeed, we usually also want an efficient algorithm that enumerates all the possible decodings x in time polynomial in n, which implies a polynomial bound on the list size.There has been dramatic progress on this challenge in the recent years; see the surveys [Sud, Gur].
To cast list-decodable codes in our framework, note that given a code Enc : (1) Note that the range of Γ is [D]× [q] and it has the property that the first component of Γ(x, y) is always y.Moreover, given any Γ with this property, we can obtain a corresponding code Enc.
Proof.It suffices to show that for every r ∈ [q] D and x ∈ [N ], we have x ∈ LIST Γ (T r , 1/q + ε) iff Enc(x) agrees with r in greater than a 1/q + ε fraction of places.We show this as follows: In addition to the particular range of parameters typically studied (e.g. the small alphabet size q), the other feature that distinguishes list-decodable codes from many of the other objects described below is that it only considers sets of the form T r ⊆ [D] × [q] for received words r ∈ [q] D .These sets contain only one element for each possible first component y ∈ [D], and thus are of size exactly D. Note that as the alphabet size q grows, these sets contain a vanishingly small fraction of the range [D] × [q].S. Vadhan

Samplers
Suppose we are interested in estimating the average value of a boolean function T : [M ] → {0, 1} on a huge domain [M ], given an oracle for T .The Chernoff Bound tells us that if we take D = O(log(1/δ)/ε 2 ) independent random samples from [M ], then with probability at least 1 − δ, the average of T on the sample will approximate T 's global average within an additive error of ε.However, it is well known that these samples need not be generated independently; for example, samples generated according to a k-wise independent distribution or by a random walk on an expander graph have similar properties [CG2,BR,SSS,Gil].The advantage of using such correlated sample spaces is that the samples can be generated using many fewer random bits than independent samples; this can be useful for derandomization and or simply because it provides a compact representation of the sequence of samples.
The definition below abstracts this idea of a procedure that uses n = log N random bits to generate D samples from [M ] with the above average-approximation property.
Note that the definition only bounds the probability that the sample-average deviates from µ(T ) from above.However, a bound in both directions can be obtained by applying the above definition also to the complement of T , at the price of a factor of 2 in the error probability δ. (Considering deviations in only one direction will allow us to cast samplers in our framework without any slackness in parameters.)We note that the above definition can be also generalized to functions T that are not necessarily boolean, and instead map to the real interval [0, 1].Nonboolean samplers and boolean samplers turn out to be equivalent up to a small loss in the parameters [Zuc2].
We note that one can consider more general notions of samplers that make adaptive oracle queries to the function T and and/or produce their estimate of µ(T ) by an arbitrary computation on the values returned (not necessarily taking the sample average).In fact, utilizing this additional flexibility, there are known explicit samplers that achieve better parameters than we know how to achieve with averaging samplers.(For these generalizations, constructions of such samplers, and discussion of other issues regarding samplers, see the survey [Gol1].)Nevertheless, some applications require averaging samplers, and averaging samplers are also more closely related to the other objects we are studying.
In terms of the parameters, one typically considers M , ε, and δ as given, and seeks to minimize both the number n = log N of random bits and the number D of samples.Usually, complexity is measured as a function of m = log M , with ε ranging between constant and 1/poly(m), and δ ranging between o(1) and  2  −poly(m) .
Samplers can be cast rather directly into our framework as follows.Given a sampler Smp for domain size M that generates D samples using coin tosses from [N ], we can define Γ : (2) Conversely, any function Γ : yields a sampler.The property of Smp being an averaging sampler can be translated to the "list-decodability" of Γ as follows.
Proposition 5. Let Smp be a sampler for domain size M that generates D samples using coin tosses from [N ], and let Γ : where K = δN and µ(T Proof.We can view a function T : [M ] → {0, 1} as the characteristic function of a subset of [M ], which, by abuse of notation, we also denote by T .Note that LIST Γ (T, µ(T ) + ε) is precisely the set of coin tosses x for which Smp(x) outputs a sample on which T 's average is greater than µ(T ) + ε.Thus, the probability of a bad sample is at most Let's compare the characterization of samplers given by Proposition 5 to the characterization of list-decodable codes given by Proposition 3. One difference is that codes correspond to functions Γ where Γ(x, y) always includes y as a prefix.This turns out to be a relatively minor difference, and most known samplers can be modified to have this property.A major difference, however, is that for listdecodable codes, we only consider decoding from sets of the form T r for some received word r ∈ [q] D .Otherwise, the two characterizations are identical.(Note that µ(T r ) = 1/q, and bounding K and bounding δ are equivalent via the relation K = δN .)Still, the settings of parameters typically considered in the two cases are quite different.In codes, the main growing parameter is the message length n = log N , and one typically wants the alphabet size q to be a constant (e.g.q = 2) and the codeword length D to be linear in n.Thus, the range of Γ is of size M = D • q = O(log N ).In samplers, the main growing parameter is m = log M , which is the number of random bits needed to select a single element of the universe [M ] uniformly at random, and one typically seeks samplers using a number random bits n = log N that is linear (or possibly polynomial) in m.Thus, M = N Ω(1) , in sharp contrast to the typical setting for codes.Also in contrast to codes, samplers are interesting even when δ is a constant independent of N (or vanishes slowly as a function of N ).In such a case, the number of samples can be independent of N (e.g. in an optimal sampler, D = O(log(1/δ)/ε 2 ).But constant δ in codes means that the list size K = δN is a constant fraction of the message space, which seems too large to be useful from a coding perspective.Instead, the list size for codes is typically required to be K = poly(n) = poly(log N ), which forces the codeword length D to be at least as large as the message length n = log N .

Expander Graphs
Expanders are graphs with two seemingly contradictory properties.On one hand, they have very low degree; on the other, they are extremely well-connected.Expanders have numerous applications in theoretical computer science, and their study has also turned out to be mathematically very rich; see the survey [HLW].
There are a variety of measures of expansion, with close relationships between them, but here we will focus on the most basic measure, known as vertex expansion.We restrict attention to bipartite graphs, where the requirement is that every set of left-vertices that is not too large must have "many" neighbors on the right.We allow multiple edges between vertices.We require the graph to be left-regular, but it need not be right-regular.Definition 6.Let G be a left-regular bipartite multigraph with left vertex set [N ], right vertex set [M ], and left degree D. G is an (= K, A) expander if every leftset S of size at least K has at least A • K neighbors on the right.G is a (K, A) expander if it is a (= K , A) expander for every K ≤ K.
The classic setting of parameters for expanders is the balanced one, where M = N , and then the goal is to have the degree D and the expansion factor A to both be constants independent of the number of vertices, with A > 1 and expansion achieved for sets of size up to K = Ω(M ).However, the imbalanced case M < N is also interesting, and then even expansion factors A smaller than 1 are nontrivial (provided A > M/N ).
We can cast expanders in our framework as follows.Proof.We show that G fails to be an (= K, A) expander iff Condition (4) is false.If G is not an (= K, A) expander, then there is a left-set S ⊆ [N ] of size at least K with fewer than AK neighbors on the right.Let T be the set of neighbors of S.
Conversely, suppose that Condition (4) fails.Then there is a right-set T ⊆ [M ] of size less than AK for which |LIST Γ (T, 1)| ≥ K.But the neighbors of LIST Γ (T, 1) are all elements of T , violating expansion.
We now compare the characterization of expanders given in Proposition 7 to those for list-decodable codes and samplers.First, note that we quantify over all sets T of a bounded size (namely, smaller than AK).In codes, the sets T were also of a small size but also restricted to be of the form T r for a received word r.In samplers, there was no constraint on T .Second, we only need a bound on |LIST Γ (T, 1)|, which is conceivably easier to obtain than a bound on |LIST Γ (T, µ(T ) + ε)| as in codes and samplers.Nevertheless, depending on the parameters, vertex expansion (as in Definition 6 and Proposition 7) often implies stronger measures of expansion (such as a spectral gap [Alo] and randomness condensing [TUZ]), which in turn imply bounds on |LIST Γ (T, µ(T ) + ε)|.
The typical parameter ranges for expanders are more similar to those for samplers than for those of codes.Specifically, N and M tend to be of comparable size; indeed, the classic case is N = M , and even in the unbalanced case, they are typically polynomially related.However, for expanders, there is no parameter ε.On the other hand, there is something new to optimize, namely the expansion factor A, which is the ratio between the size of T and the list size K.In particular, to have expansion factor larger than 1 (the classic setting of parameters for expansion), we must have a list size that is smaller than |T |.In samplers, however, there is no coupling of the list size and |T |; the list size K = δN depends on the error probability δ, and should be apply for every T ⊆ [M ].With list-decodable codes, the set T = T r is always small (of size D), but the difference between list size D and, say, D/2 is typically insignificant.
Despite the above differences between codes and expanders, recent constructions of list-decodable codes have proved useful in constructing expanders with near-optimal expansion factors (namely, A = (1 − ε)D) via Proposition 7 [GUV].A formulation of expansion similar to Proposition 7 also appeared in [GT].

Randomness Extractors
A randomness extractor is a function that extracts almost-uniform bits from a source of biased and correlated bits.The original motivation for extractors was the simulation of randomized algorithms with physical sources of randomness, but they have turned out to have a wide variety of other applications in theoretical computer science.Moreover, they have played a unifying role in the theory of pseudorandomness, and have been the avenue through which many of the connections described in this survey were discovered.History, applications, and constructions of extractors are described in more detail in [NT, Sha].
To formalize the notion of an extractor, we need to model a "source of biased and correlated bits" and define what it means for the output of the extractor to be "almost uniform."For the former, we adopt a very general notion, advocated in [CG1,Zuc1], where we only require that the source has enough randomness in it, as measured by the following variant of entropy.
Definition 8.The min-entropy of a random variable X is Intuitively, we think of a k-source as having "k bits of randomness" in it.For example, a random variable that is uniformly distributed over any K = 2 k strings is a k-source.
For the quality of the output of the extractor, we use a standard measure of distance between probability distributions.Definition 9.The statistical difference between random variables X and Y taking values in a universe [M ] is defined to be X and Y are ε-close if ∆(X, Y ) ≤ ε.Otherwise, we say they are ε-far.
The equivalence between the formulations of statistical difference with and without the absolute values can be seen by observing that Pr Ideally we'd like an extractor to be a function Ext : . That is, given an n-bit string coming from an unknown random source with at least k bits of randomness, the extractor is guaranteed to produce m bits that are close to uniform.However, this is easily seen to be impossible even when m = 1: the uniform distribution on either Ext −1 (0) or Ext −1 (1) is an (n − 1)-source on which the output of the extractor is constant.
Nisan and Zuckerman [NZ] proposed to get around this difficulty by allowing the extractor a small number of truly random bits as a seed for the extraction.2This leads to the following definition.

Definition 10 ([NZ]). Ext : [
The reason extraction is still interesting is that the number d = log D of truly random bits can be much smaller than the number of almost-uniform bits extracted.Indeed, d can be even be logarithmic in m = log M , and thus in many applications, the need for a seed can be eliminated by enumerating all 2 d possibilities.
The ranges of the min-entropy threshold k most commonly studied in the extractor literature are k = αn or k = n α for constants α ∈ (0, 1), where n = log N is the length of the source.The error parameter ε is often taken to be a small constant, but vanishing ε is important for some applications (especially in cryptography).One usually aims to have a seed length d = O(log n) or d = polylog(n), and have the output length m = log M be as close to k as possible, corresponding to extracting almost all of the randomness from the source.(Ideally, m ≈ k + d, but m = Ω(k) or m = k Ω(1) often suffices.) Notice that the syntax of extractors already matches that of the functions Γ : studied in our framework.The extraction property can be captured, with a small slackness in parameters, as follows.
1. Suppose that Condition (5) fails.That is, there is a set . Thus, Ext is not a (k, ε) extractor.
2. Suppose Condition (5) holds.To show that Ext is a (k + log(1/ε), 2ε) extractor, let X be any (k + log(1/ε))-source taking values in [N ].We need to show that Ext(X, . That is, we need to show that for every

S. Vadhan
So let T be any subset of [M ].Then The slackness in parameters in the above characterization is typically insignificant for extractors.Indeed, it is known that extractors must lose at least Θ(log(1/ε)) bits of the source entropy [RT], and the above slackness only affects the leading constant.
Notice that the condition characterizing extractors here is identical to the one characterizing averaging samplers in Proposition 5. Thus, the only real difference between extractors and averaging samplers is one of perspective, and both perspectives can be useful.For example, recall that in samplers, we measure the error probability δ = K/N = 2 k /2 n , whereas in extractors we measure the min-entropy threshold k on its own.Thus, the sampler perspective can be more natural when δ is relatively large compared to 1/N , and the extractor perspective when δ becomes quite close to 1/N .Indeed, an extractor for min-entropy k = o(n) corresponds to a sampler with error probability δ = 1/2 (1−o(1))n , which means that each of the n bits of randomness used by the sampler reduces the error probability by almost a factor of 2! This connection between extractors and samplers was proven and exploited by Zuckerman [Zuc2].The characterization of extractors in Proposition 11 was implicit in [Zuc2,Tre1], and was explicitly formalized in coding-theoretic terms by Ta-Shma and Zuckerman [TZ].

Hardness Amplifiers
The connections described in following two sections, which emerged from the work of Trevisan [Tre1], are perhaps the most surprising of all, because they establish a link between complexity-theoretic objects (which refer to computational intractability) and the purely information-theoretic and combinatorial objects we have been discussing so far.
Complexity Measures.In this section, we will be referring to a couple of different measures of computational complexity, which we informally review here.A boolean circuit C computes a finite function C : {0, 1} → {0, 1} m using bit operations (such as AND, OR, and NOT).The size of a circuit C is the number of bit operations it uses.When we say that a circuit C computes a function C : we mean that it maps the log n -bit binary representation of any element x ∈ [n] to the corresponding log q -bit binary representation of C(x).
As a measure of computational complexity, boolean circuit size is known to be very closely related to the running time of algorithms.However, boolean circuits compute functions on finite domains, so one needs to design a circuit separately for each input length, whereas an algorithm is typically required to be a single "uniform" procedure that works for all input lengths.This gap can be overcome by considering algorithms that are augmented with a "nonuniform advice string" for each input length: Fact 12 ( [KL]).Let f : {0, 1} * → {0, 1} * be a function defined on bit-strings of every length, and s : N → N (with s(n) ≥ n).Then the following are equivalent: 1.There is a sequence of circuits C 1 , C 2 , . . .such that C n (x) = f (x) for every x ∈ {0, 1} n , and the size of 2. There is an algorithm A and a sequence of advice strings α 1 , α 2 , . . .∈ {0, 1} * such that A(x, α n ) = f (x) for every x ∈ {0, 1} n , and both the running time of A on inputs of length n and |α n | are Õ(s(n)).
Thus "circuit size" equals "running time of algorithms with advice," up to polylogarithmic factors (hidden by the Õ(•) notation).Notice that, for the equivalence with circuit size, the running time of A and the length of its advice string are equated; below we will sometimes consider what happens when we decouple the two (e.g.having bounded-length advice but unbounded running time).
We will also sometimes refer to computations with "oracles".Running an algorithm A with oracle access to a function f (denoted A f ) means that as many times as it wishes during its execution, A can make a query x to the function f and receive the answer f (x) in one time step.That is, A can use f as a subroutine, but we do not charge A for the time to evaluate f .But note that if A runs in time t and f can be evaluated in time s, then A f can be simulated by a non-oracle algorithm B that runs in time t • s.The same is true if we use circuit size instead of running time as the complexity measure.
Hardness Amplification.Hardness amplification is the task of increasing the average-case hardness of a function.We measure the average-case hardness of a function by the fraction of inputs on which every efficient algorithm (or circuit) must err.
Hardness amplification is concerned with transforming a function so as to increase δ, the fraction of inputs on which it is hard.Ideally, we would like to go from δ = 0, corresponding to worst-case hardness, to δ = 1−1/q −ε, which is the largest value we can hope for (since every function with a range of [q] can be computed correctly on a 1/q fraction of inputs by a constant circuit).In addition to the basic motivation of relating worst-case and average-case hardness, such hardness amplifications also are useful in constructing pseudorandom generators (see Section 8), because it is easier to construct pseudorandom generators from average-case hard functions (specifically, when q = 2 and δ = 1/2 − ε) [NW, BFNW].
To make the goal more precise, we are interested in transformations for converting a function f : [n] → {0, 1} that is (s, 0) hard to a function f : [n ] → [q] that is (s , 1 − 1/q − ε) hard for a constant q (ideally q = 2) and small ε.(The restriction of f to have range {0, 1} is without loss of generality when considering worst-case hardness; otherwise we can use the function that outputs the j'th bit of f (i) on input (i, j).) The price that we usually pay for such hardness amplifications is that the circuit size for which the function is hard decreases (i.e.s < s) and the domain size increases (i.e.n > n); we would like these to be losses to be moderate (e.g.polynomial).Also, the complexity of computing the function correctly often increases and we again would like this increase to be moderate (e.g.f should be computable in exponential time if f is).However, this latter property turns out to correspond to the "explicitness" of the construction, and thus we will not discuss it further below.
Several transformations achieving the above goal of converting worst-case hardness into average-case hardness are known; see the surveys [Kab,Tre2].Like most (but not all!) results in complexity theory, these transformations are typically "black box" in the following sense.First, a single "universal" transformation algorithm Amp is given that shows how to compute f given oracle access to f , and this transformation is well-defined for every oracle f , regardless of its complexity (even though we are ultimately interested only in functions f within some complexity class, such as exponential time).Second, the property that f is average-case hard when f is worst-case hard is proven by giving an "reduction" algorithm Red that efficiently converts algorithms r computing f well on average into algorithms computing f in the worst-case.(Thus if f is hard in the worst case, there can be no efficient r computing f well on average.)Again, even though we are ultimately interested in applying the reduction to efficient algorithms r, this property of the reduction should hold given any oracle r, regardless of its efficiency.Since our notion of hardness refers to nonuniform circuits, we will allow the reduction Red to use some nonuniform advice, which may depend on both f and r.
Black-box worst-case-to-average-case hardness amplifiers as described here are captured by the following definition.Definition 14.Let Amp f : [D] → [q] be an algorithm that is defined for every oracle f : [n] → {0, 1}.We say that Amp is a (t, k, ε) black-box worst-caseto-average-case hardness amplifier if there is an oracle algorithm Red, called the reduction, running in time t such that for every function r : The amplified function is f = Amp f ; we have denoted the domain size as D rather than n for convenience below.Note that without loss of generality, k ≤ t, because an algorithm running in time t cannot read more than t bits of its advice string.
The following proposition shows that transformations meeting Definition 14 suffice for amplifying hardness.
Proof.Suppose for contradiction there is a circuit r : [D] → [q] of size s computing Amp f on greater than a 1 − 1/q + ε fraction of inputs.Then there is an advice string z such that Red r (•, z) computes f correctly on all inputs.Hardwiring z and using the fact that algorithms running in time t can be simulated by circuits of size Õ(t), we get a circuit of size Õ(t) • s computing f correctly on all inputs.This is a contradiction for s = s/ Õ(t).
Typical settings of parameters for hardness amplification are q = 2, ε ranging from o(1) to 1/n Ω(1) , and t = poly(log n, 1/ε).Note that we make no reference to the length k of the advice string, and it does not appear in the conclusion of Proposition 15.Indeed, for the purposes of hardness amplification against nonuniform circuits, k may as well be set equal to running time t of the reduction.However, below it will be clarifying to separate these two parameters.Now we place black-box hardness amplifiers in our framework.Given Amp f : [D] → [q] defined for every oracle f : [n] → {0, 1}, we can define Γ : where N = 2 n and we view [N ] as consisting of all boolean functions on [n].Just as with list-decodable codes, the second input y is a prefix of the output of Γ.Moreover, any function Γ with this property yields a corresponding amplification algorithm Amp in the natural way.This syntactic similarity between codes and hardness amplifiers is no coincidence.The next proposition shows that, if we allow reductions Red of unbounded running time t (but still bounded advice length k), then black-box hardness amplifiers are equivalent to list-decodable codes.
Proposition 16.Let Amp f : [D] → [q] be an algorithm that is defined for every oracle f : be the function corresponding to Amp via (6), where N = 2 n .Then Amp is an (∞, k, ε) black-box hardness amplifier if and only if where T r = {(y, r y ) : y ∈ [D]}.
Note that the characterization given here is indeed identical to that of listdecodable codes given in Proposition 3. k of advice bits corresponds to the list size K = 2 k .(It is sometimes useful to allow a more general formulation, where the correspondence between the advice strings z and decodings f , can be determined by a randomized preprocessing phase, which is given oracle access to r; see [STV].) Despite their close relationship, there are some differences in the typical parameter ranges for list-decodable codes and hardness amplification.In list-decodable codes, one typically wants the agreement parameter ε to be a constant and the codeword length to be linear in the message length (i.e.D log q = O(n)).In hardness amplification, ε is usually taken to be vanishingly small (even as small as 1/n Ω( 1) ), and one can usually afford for the codeword length to be polynomial in the message length (i.e.D log q = poly(n)), because this corresponds to a linear blow-up in the input length of the amplified function Amp f as compared to f .Another difference is that in locally list-decodable codes, it is most natural to for the list size K to be comparable to the running time t of the decoder, so the decoder has time to enumerate the elements of the list.For hardness amplification against nonuniform circuits, we may as well allow for the number of advice bits k to be as large as the running time t, which means that the list size K = 2 k can be exponential in t.
The fact that locally list-decodable codes imply worst-case-to-average-case hardness amplification was shown by Sudan et al. [STV].The fact that black-box amplifications imply list-decodable codes was implicit in [Tre1], and was made explicit in [TV].

Pseudorandom Generators
A pseudorandom generator is a deterministic function that stretches a short seed of truly random bits into a long string of "pseudorandom" bits that "look random" to any efficient algorithm.The idea of bits "looking random" is formalized by the notion of computational indistinguishability, which is a computational analogue of statistical difference (cf., Definition 9).

Definition 17 ([GM]
). Random variables X and Y are (s, ε) indistinguishable if for every boolean circuit T of size s, we have This is equivalent to the more standard definition in which we bound the absolute value of the left-hand side by replacing T with its complement (which does not affect standard measures of circuit size).Now we can define a pseudorandom generator as a function stretching d truly random bits into m > d bits that are computationally indistinguishable from m truly random bits.

S. Vadhan
Pseudorandom generators are powerful tools for cryptography and for derandomization (converting randomized algorithms to deterministic algorithms).See the surveys [CRT,Mil,Kab,Gol3].As far as the parameters, we would like the seed length d = log D to be as small as possible relative to the output length m = log M , and we typically want generators that fool circuits of size s = poly(m).The error parameter ε is usually not too important for derandomization (e.g.constant ε) suffices, but vanishing ε (e.g.ε = 1/poly(m)) is typically achievable and is crucial for cryptographic applications.
Another important parameter is the complexity of computing the generator itself.Even though this will not be explicit below, our discussions are most relevant to pseudorandom generators whose running time may be larger than the distinguishers T they fool, e.g.polynomial in s or even exponential in the seed length d = log D. The study of such generators was initiated by Nisan and Wigderson [NW].They suffice for derandomization, where we allow a polynomial slowdown in the algorithm we derandomize and anyhow enumerate over all D = 2 d seeds.They are not suitable, however, for most cryptographic applications, where the generator is run by the honest parties, and must fool adversaries that have much greater running time.
The advantage of "noncryptographic" pseudorandom generators, whose running time is greater than that of the distinguishers, is that they can be constructed under weaker assumptions.The existence of "cryptographic" generators is equivalent to the existence of one-way functions [HILL], whereas "noncryptographic" generators can be constructed from any boolean function (computable in time 2 O(n) ) with high circuit complexity [NW, BFNW].
We formalize the notion of a black-box construction of pseudorandom generators from functions of high worst-case circuit complexity analogously to Definition 14.
Definition 19.Let G f : [D] → [M ] be an algorithm that is defined for every oracle f : [n] → {0, 1}.We say that G is a (t, k, ε) black-box PRG construction if there is an oracle algorithm Red, running in time t, such that for every T : there is an advice string z ∈ [K] such that ∀i ∈ [n] Red T (i, z) = f (i).
Analogously to Proposition 15, black-box constructions according to the above definition do suffice for constructing pseudorandom generators from functions of high circuit complexity.
Again, for the purposes of this proposition, k may as well be taken to be equal to t, and the values of interest range from the "high end" k = t = n Ω(1) (applicable for functions f whose circuit complexity is exponential in the input length, log n) to the "low end" k = t = (log n) Ω(1) (applicable for functions f of superpolynomial circuit complexity).
We place pseudorandom generators in our framework analogously to hardness amplifiers.Given Notice that the condition in Proposition 21 is identical to the ones in our characterizations of averaging samplers (Proposition 5) and randomness extractors (Proposition 11).Thus, black-box pseudorandom generator constructions with reductions of unbounded running time (but bounded advice length k) are equivalent to both averaging samplers and randomness extractors.Analogously to the discussion of hardness amplifiers, an efficient reduction corresponds to extractors and samplers with efficient "local decoding" procedures.Here the decoder is given oracle access to a statistical test T that is trying to distinguish the output of the extractor Ext from uniform.It should be able to efficiently compute any desired bit of any source string x = f for which T succeeds in distinguishing the output Ext(x, U [D] ) from uniform given some k = log K bits of advice depending on x.Even though achieving this additional local decoding property seems to only make constructing extractors more difficult, the perspective it provides has proved useful in constructing extractors, because it suggests an algorithmic approach to establishing the extractor property (namely, designing an appropriate reduction/decoder).
In terms of parameters, black-box PRG constructions are closer to extractors than samplers.In particular, the "high end" of PRG constructions has k = t = n Ω(1) , corresponding to extracting randomness from sources whose min-entropy is polynomially smaller than the length.However, a difference with extractors is that in pseudorandom generator constructions, one typically only looks for an output length m that it is polynomially related to t = k.This corresponds to extractors that extract m = k Ω(1) bits out of the k bits of min-entropy in the source, but for extractors, achieving m = Ω(k) or even m ≈ k + d is of interest.The connection between pseudorandom generators and extractors described here was discovered and first exploited by Trevisan [Tre1], and has inspired many subsequent works.
For a left-regular bipartite multigraph G with left vertex set [N ], right vertex set [M ], and left degree D, we define the neighbor function Γ : [N ] × [D] → [M ] by Γ(x, y) = the y'th neighbor of x (3) Proposition 7. Let G be a left-regular bipartite multigraph with left vertex set [N ], right vertex set [M ], and left degree D, and let Γ : [N ] × [D] → [M ] be the neighbor function corresponding to G via Equation (3).Then G is an (= K, A) expander if and only if ∀T ⊆ [M ] s.t.|T | < AK |LIST Γ (T, 1)| < K. (4) Thus, G is a (K, A) expander iff for every T ⊆ [M ] of size less than AK, we have |LIST Γ (T, 1)| < |T |/A.