arXiv:1103.3869v3 [math.PR] 25 Jul 2012 Spectral Statistics of Erdo˝s-R´enyi Graphs II: Eigenvalue Spacing and the Extreme Eigenvalues La´szlo´ Erdo˝s1∗ Antti Knowles2† Horng-Tzer Yau2‡ Jun Yin2§ Institute of Mathematics, University of Munich, Theresienstrasse 39, D-80333 Munich, Germany lerdos@math.lmu.de 1 Department of Mathematics, Harvard University Cambridge MA 02138, USA knowles@math.harvard.edu htyau@math.harvard.edu jyin@math.harvard.edu 2 July 27, 2012 We consider the ensemble of adjacency matrices of Erd˝os-R´enyi random graphs, i.e. graphs on N vertices where every edge is chosen independently and with probability p ≡ p(N ). We rescale the matrix so that its bulk eigenvalues are of order one. Under the assumption pN ≫ N 2/3, we prove the universality of eigenvalue distributions both in the bulk and at the edge of the spectrum. More precisely, we prove (1) that the eigenvalue spacing of the Erd˝os-R´enyi graph in the bulk of the spectrum has the same distribution as that of the Gaussian orthogonal ensemble; and (2) that the second largest eigenvalue of the Erd˝os-R´enyi graph has the same distribution as the largest eigenvalue of the Gaussian orthogonal ensemble. As an application of our method, we prove the bulk universality of generalized Wigner matrices under the assumption that the matrix entries have at least 4 + ε moments. AMS Subject Classification (2010): 15B52, 82B44 Keywords: Erdo˝s-R´enyi graphs, universality, Dyson Brownian motion. ∗Partially supported by SFB-TR 12 Grant of the German Research Council †Partially supported by NSF grant DMS-0757425 ‡Partially supported by NSF grants DMS-0757425, 0804279 §Partially supported by NSF grant DMS-1001655 1 1. Introduction The Erdo˝s-R´enyi ensemble [20, 21] is a law of a random graph on N vertices, in which each edge is chosen independently with probability p ≡ p(N ). The corresponding adjacency matrix is called the Erdo˝s-R´enyi matrix. Since each row and column has typically pN nonzero entries, the matrix is sparse as long as p ≪ 1. We shall refer to pN as the sparseness parameter of the matrix. In the companion paper [11], we established the local semicircle law for the Erdo˝s-R´enyi matrix for pN (log N )C , i.e. we showed that, assuming pN (log N )C , the eigenvalue density is given by the Wigner semicircle law in any spectral window containing on average at least (log N )C′ eigenvalues. In this paper, we use this result to prove both the bulk and edge universalities for the Erdo˝s-R´enyi matrix under the restriction that the sparseness parameter satisfies pN ≫ N 2/3. (1.1) More precisely, assuming that p satisfies (1.1), we prove that the eigenvalue spacing of the Erdo˝s-R´enyi graph in the bulk of the spectrum has the same distribution as that of the Gaussian orthogonal ensemble (GOE). In order to outline the statement of the edge universality for the Erdo˝s-R´enyi graph, we observe that, since the matrix elements of the Erdo˝s-R´enyi ensemble are either 0 or 1, they do not satisfy the mean zero condition which typically appears in the random matrix literature. In particular, the largest eigenvalue of the Erdo˝s-R´enyi matrix is very large and lies far away from the rest of the spectrum. We normalize the Erdo˝s-R´enyi matrix so that the bulk of its spectrum lies in the interval [−2, 2]. By the edge universality of the Erdo˝s-R´enyi ensemble, we therefore mean that its second largest eigenvalue has the same distribution as the largest eigenvalue of the GOE, which is the well-known Tracy-Widom distribution. We prove the edge universality under the assumption (1.1). Neglecting the mean zero condition, the Erdo˝s-R´enyi matrix becomes a Wigner random matrix with a Bernoulli distribution when 0 < p < 1 is a constant independent of N . Thus for p ≪ 1 we can view the Erdo˝s-R´enyi matrix, up to a shift in the expectation of the matrix entries, as a singular Wigner matrix for which the probability distributions of the matrix elements are highly concentrated at zero. Indeed, the probability for a single entry to be zero is 1 − p. Alternatively, we can express the singular nature of the Erdo˝s-R´enyi ensemble by the fact that the k-th moment of a matrix entry is bounded by N −1(pN )−(k−2)/2 . (1.2) For p ≪ 1 this decay in k is much slower than in the case of Wigner matrices. There has been spectacular progress in the understanding of the universality of eigenvalue distributions for invariant random matrix ensembles [5, 7, 8, 27, 28]. The Wigner and Erdo˝s-R´enyi matrices are not invariant ensembles, however. The moment method [31, 33, 32] is a powerful means for establishing edge universality. In the context of sparse matrices, it was applied in [32] to prove edge universality for the zero mean version of the d-regular graph, where the matrix entries take on the values −1 and 1 instead of 0 and 1. The need for this restriction can be ascribed to the two following facts. First, the moment method is suitable for treating the largest and smallest eigenvalues. But in the case of the Erdo˝s-R´enyi matrix, it is the second largest eigenvalue, not the largest one, which behaves like the largest eigenvalue of the GOE. Second, the modification of the moment method to matrices with non-symmetric distributions poses a serious technical challenge. A general approach to proving the universality of Wigner matrices was recently developed in the series of papers [12, 13, 14, 15, 16, 17, 18, 19]. In this paper, we further extend this method to cover sparse matrices such as the Erdo˝s-R´enyi matrix in the range (1.1). Our approach is based on the following three 2 ingredients. (1) A local semicircle law – a precise estimate of the local eigenvalue density down to energy scales containing around (log N )C eigenvalues. (2) Establishing universality of the eigenvalue distribution of Gaussian divisible ensembles, via an estimate on the rate of decay to local equilibrium of the Dyson Brownian motion [9]. (3) A density argument which shows that for any probability distribution of the matrix entries there exists a Gaussian divisible distribution such that the two associated Wigner ensembles have identical local eigenvalue statistics down to the scale 1/N . In the case of Wigner matrices, the edge universality can also be obtained by a modification of (1) and (3) [19]. The class of ensembles to which this method applies is extremely general. So far it includes all (generalized) Wigner matrices under the sole assumption that the distributions of the matrix elements have a uniform subexponential decay. In this paper we extend this method to the Erdo˝s-R´enyi matrix, which in fact represents a generalization in two unrelated directions: (a) the law of the matrix entries is much more singular, and (b) the matrix elements have nonzero mean. As an application of the local semicircle law for sparse matrices proved in [11], we also prove the bulk universality for generalized Wigner matrices under the sole assumption that the matrix entries have 4 + ε moments. This relaxes the subexponential decay condition on the tail of the distributions assumed in [17, 18, 19]. Moreover, we prove the edge universality of Wigner matrices under the assumption that the matrix entries have 12 + ε moments. These results on Wigner matrices are stated and proved in Section 7 below. We note that in [3] it was proved that the distributions of the largest eigenvalues are Poisson if the entries have at most 4 − ε moments. Numerical results [4] predict that the existence of four moments corresponds to a sharp transition point, where the transition is from the Poisson process to the determinantal point process with Airy kernel. We remark that the bulk universality for Hermitian Wigner matrices was also obtained in [34], partly by using the result of [22] and the local semicircle law from Step (1). For real symmetric Wigner matrices, the bulk universality in [34] requires that the first four moments of every matrix element coincide with those of the standard Gaussian random variable. In particular, this restriction rules out the real Bernoulli Wigner matrices, which may be regarded as the simplest kind of an Erdo˝s-R´enyi matrix (again neglecting additional difficulties arising from the nonzero mean of the entries). As a first step in our general strategy to prove universality, we proved, in the companion paper [11], a local semicircle law stating that the eigenvalue distribution of the Erdo˝s-R´enyi ensemble in any spectral window which on average contains at least (log N )C eigenvalues is given by the Wigner semicircle law. As a corollary, we proved that the eigenvalue locations are equal to those predicted by the semicircle law, up to an error of order (pN )−1. The second step of the strategy outlined above for Wigner matrices is to estimate the local relaxation time of the Dyson Brownian motion [15, 16]. This is achieved by constructing a pseudo-equilibrium measure and estimating the global relaxation time to this measure. For models with nonzero mean, such as the Erdo˝s-R´enyi matrix, the largest eigenvalue is located very far from its equilibrium position, and moves rapidly under the Dyson Brownian motion. Hence a uniform approach to equilibrium is impossible. We overcome this problem by integrating out the largest eigenvalue from the joint probability distribution of the eigenvalues, and consider the flow of the marginal distribution of the remaining N − 1 eigenvalues. This enables us to establish bulk universality for sparse matrices with nonzero mean under the restriction (1.1). This approach trivially also applies to Wigner matrices whose entries have nonzero mean. Since the eigenvalue locations are only established with accuracy (pN )−1, the local relaxation time for the Dyson Brownian motion with the initial data given by the Erdo˝s-R´enyi ensemble is only shown to be less than 1/(p2N ) ≫ 1/N . For Wigner ensembles, it was proved in [19] that the local relaxation time is of order 1/N . Moreover, the slow decay of the third moment of the Erdo˝s-R´enyi matrix entries, as given in (1.2), makes the approximation in Step (3) above less effective. These two effects impose the restriction (1.1) in our proof of bulk universality. At the end of Section 2 we give a more detailed account of how this 3 restriction arises. The reason for the same restriction’s being needed for the edge universality is different; see Section 6.3. We note, however, that both the bulk and edge universalities are expected to hold without this restriction, as long as the graphs are not too sparse in the sense that pN ≫ log N ; for d-regular graphs this condition is conjectured to be the weaker pN ≫ 1 [30]. A discussion of related problems on d-regular graphs can be found in [26]. Acknowledgement. We thank P. Sarnak for bringing the problem of universality of sparse matrices to our attention. 2. Definitions and results We begin this section by introducing a class of N × N sparse random matrices A ≡ AN . Here N is a large parameter. (Throughout the following we shall often refrain from explicitly indicating N -dependence.) The motivating example is the Erdo˝s-R´enyi matrix, or the adjacency matrix of the Erdo˝s-R´enyi random graph. Its entries are independent (up to the constraint that the matrix be symmetric), and equal to 1 with probability p and 0 with probability 1 − p. For our purposes it is convenient to replace p with the new parameter q ≡ q(N ), defined through p = q2/N . Moreover, we rescale the matrix in such a way that its bulk eigenvalues typically lie in an interval of size of order one. Thus we are led to the following definition. Let A = (aij) be the symmetric N × N matrix whose entries aij are independent (up to the symmetry constraint aij = aji) and each element is distributed according to aij = γ q 1 0 with probability q2 N with probability 1 − q2 N . (2.1) Here γ ..= (1 − q2/N )−1/2 is a scaling introduced for convenience. The parameter q N 1/2 expresses the sparseness of the matrix; it may depend on N . Since A typically has q2N nonvanishing entries, we find that if q ≪ N 1/2 then the matrix is sparse. We extract the mean of each matrix entry and write A = H + γq |e e| , where the entries of H (given by hij = aij − γq/N ) have mean zero, and we defined the vector e ≡ eN ..= √1 (1, . . . , 1)T . N Here we use the notation |e e| to denote the orthogonal projection onto e, i.e. (|e e|)ij ..= N −1. One readily finds that the matrix elements of H satisfy the moment bounds (2.2) Eh2ij = 1 N , E hij p 1 N qp−2 , (2.3) where p 2. More generally, we consider the following class of random matrices with non-centred entries characterized by two parameters q and f , which may be N -dependent. The parameter q expresses how singular the distribution of hij is; in particular, it expresses the sparseness of A for the special case (2.1). The parameter f determines the nonzero expectation value of the matrix elements. 4 Definition 2.1 (H). We consider N ×N random matrices H = (hij ) whose entries are real and independent up to the symmetry constraint hij = hji. We assume that the elements of H satisfy the moment conditions Ehij = 0 , E|hij |2 = 1 N , E|hij |p Cp N qp−2 (2.4) for 1 i, j N and 2 p (log N )10 log log N , where C is a positive constant. Here q ≡ q(N ) satisfies (log N )15 log log N q CN 1/2 (2.5) for some positive constant C. Definition 2.2 (A). Let H satisfy Definition 2.1. Define the matrix A = (aij) through A ..= H + f |e e| , (2.6) where f ≡ f (N ) is a deterministic number that satisfies 1 + ε0 f N C , (2.7) for some constants ε0 > 0 and C. Remark 2.3. For definiteness, and bearing the Erdo˝s-R´enyi matrix in mind, we restrict ourselves to real symmetric matrices satisfying Definition 2.2. However, our proof applies equally to complex Hermitian sparse matrices. Remark 2.4. As observed in [11], Remark 2.5, we may take H to be a Wigner matrix whose entries have subexponential decay E|hij |p (Cp)θpN −p/2 by choosing q = N 1/2(log N )−5θ log log N . We shall use C and c to denote generic positive constants which may only depend on the constants in assumptions such as (2.4). Typically, C denotes a large constant and c a small constant. Note that the fundamental large parameter of our model is N , and the notations ≫, ≪, O(·), o(·) always refer to the limit N → ∞. Here a ≪ b means a = o(b). We write a ∼ b for C−1a b Ca. After these preparations, we may now state our results. They concern the distribution of the eigenvalues of A, which we order in a nondecreasing fashion and denote by µ1 · · · µN . We shall only consider the distribution of the N − 1 first eigenvalues µ1, . . . , µN−1. The largest eigenvalue µN lies far removed from the others, and its distribution is known to be normal with mean f + f −1 and variance N −1/2; see [11], Theorem 6.2, for more details. First, we establish the bulk universality of eigenvalue correlations. Let p(µ1, . . . , µN ) be the probability density1 of the ordered eigenvalues µ1 · · · µN of A. Introduce the marginal density p(NN−1)(µ1, . . . , µN−1) ..= 1 (N − 1)! σ∈SN −1 dµN p(µσ(1), . . . , µσ(N−1), µN ) . In other words, p(NN−1) is the symmetrized probability density of the first N − 1 eigenvalues of H. For n N − 1 we define the n-point correlation function (marginal) through p(Nn)(µ1, . . . , µn) ..= dµn+1 · · · dµN−1 p(NN−1)(µ1, . . . , µN−1) . (2.8) Similarly, we denote by p(GnO) E,N the n-point correlation function of the symmetrized eigenvalue density of an N × N GOE matrix. 1Note that we use the density of the law of the eigenvalue density for simplicity of notation, but our results remain valid when no such density exists. 5 Theorem 2.5 (Bulk universality). Suppose that A satisfies Definition 2.2 with q N φ for some φ satisfying 0 < φ 1/2, and that f additionally satisfies f CN 1/2 for some C > 0. Let β > 0 and assume that φ > 1 3 + β 6 . (2.9) Let E ∈ (−2, 2) and take a sequence (bN ) satisfying N ε−β bN ||E| − 2|/2 for some ε > 0. Let n ∈ N and O : Rn → R be compactly supported and continuous. Then lim E+bN dE′ N →∞ E−bN 2bN dα1 · · · dαn O(α1, . . . , αn) × 1 ̺sc(E)n p(Nn) − p(GnO) E,N E′ + N α1 ̺sc(E ) , . . . , E′ + αn N ̺sc(E) = 0, where we abbreviated for the density of the semicircle law. ̺sc(E) ..= 1 2π [4 − E2]+ (2.10) Remark 2.6. Theorem 2.5 implies bulk universality for sparse matrices provided that 1/3 < φ the end of this section for an account on the origin of the condition (2.9). We also prove the universality of the extreme eigenvalues. 1/2. See Theorem 2.7 (Edge universality). Suppose that A satisfies Definition 2.2 with q N φ for some φ satisfying 1/3 < φ 1/2. Let V be an N × N GOE matrix whose eigenvalues we denote by λV1 · · · λVN . Then there is a δ > 0 such that for any s we have PV N 2/3(λVN − 2) s − N −δ −N −δ as well as PA N 2/3(µN−1 − 2) s PV N 2/3(λVN − 2) s + N −δ +N −δ (2.11) PV N 2/3(λV1 + 2) s − N −δ − N −δ PA N 2/3(µ1 + 2) s PV N 2/3(λV1 + 2) s + N −δ + N −δ , (2.12) for N N0, where N0 is independent of s. Here PV denotes the law of the GOE matrix V , and PA the law of the sparse matrix A. Remark 2.8. Theorem 6.4 can be easily extended to correlation functions of a finite collection of extreme eigenvalues. Remark 2.9. The GOE distribution function F1(s) ..= limN PV N 2/3(λVN − 2) s of the largest eigenvalue of V has been identified by Tracy and Widom [36, 37], and can be computed in terms of Painlev´e equations. A similar result holds for the smallest eigenvalue λV1 of V . Remark 2.10. A result analogous to Theorem 2.7 holds for the extreme eigenvalues of the centred sparse matrix H; see (6.15) below. We conclude this section by giving a sketch of the origin of the restriction φ > 1/3 in Theorem 2.5. To simplify the outline of the argument, we set β = 0 in Theorem 2.5 and ignore any powers of N ε. The proof of Theorem 2.5 is based on an analysis of the local relaxation properties of the marginal Dyson Brownian 6 motion, obtained from the usual Dyson Brownian motion by integrating out the largest eigenvalue µN . As an input, we need the bound N −1 Q ..= E |µα − γα|2 N 1−4φ, (2.13) α=1 where γα denotes the classical location of the α-th eigenvalue (see (3.15) below). The bound (2.13) was proved in [11]. In that paper we prove, roughly, that |µα − γα| q−2 N −2φ, from which (2.13) follows. The precise form is given in (3.16). We then take an arbitrary initial sparse matrix ensemble A0 and evolve it according to the Dyson Brownian motion up to a time τ = N −ρ, for some ρ > 0. We prove that the local spectral statistics, in the first N − 1 eigenvalues, of the evolved ensemble Aτ at time τ coincide with those of a GOE matrix V , provided that Qτ −1 = QN ρ ≪ 1 . (2.14) The precise statement is given in (4.9). This gives us the condition 1 − 4φ + ρ < 0 . (2.15) Next, we compare the local spectral statistics of a given Erdo˝s-R´enyi matrix A with those of the time-evolved ensemble Aτ by constructing an appropriate initial A0, chosen so that the first four moments of A and Aτ are close. More precisely, by comparing Green functions, we prove that the local spectral statistics of A and Aτ coincide if the first three moments of the entries of A and Aτ coincide and their fourth moments differ by at most N −2−δ for some δ > 0. (See Proposition 5.2.) Given A we find, by explicit construction, a sparse matrix A0 such that the first three moments of the entries of Aτ are equal to those of A, and their fourth moments differ by at most N −1−2φτ = N −1−2φ−ρ; see (5.6). Thus the local spectral statistics of A and Aτ coincide provided that 1 − 2φ − ρ < 0 . (2.16) From the two conditions (2.15) and (2.16) we find that the local spectral statistics of A and V coincide provided that φ > 1/3. 3. The strong local semicircle law and eigenvalue locations In this preliminary section we collect the main notations and tools from the companion paper [11] that we shall need for the proofs. Throughout this paper we shall make use of the parameter ξ ≡ ξN ..= 5 log log N , (3.1) which will keep track of powers of log N and probabilities of high-probability events. Note that in [11], ξ was a free parameter. In this paper we choose the special form (3.1) for simplicity. We introduce the spectral parameter z = E + iη where E ∈ R and η > 0. Let Σ 3 be a fixed but arbitrary constant and define the domain DL ..= z ∈ C .. |E| Σ , (log N )LN −1 η 3 , (3.2) 7 with a parameter L ≡ L(N ) that always satisfies L 8ξ . (3.3) For Im z > 0 we define the Stieltjes transform of the local semicircle law msc(z) ..= R ̺sc(x) x−z dx , (3.4) where the density ̺sc was defined in (2.10). The Stieltjes transform msc(z) ≡ msc may also be characterized as the unique solution of msc + z 1 + msc = 0 (3.5) satisfying Im msc(z) > 0 for Im z > 0. This implies that √ msc(z) = −z + z2 2 − 4 , (3.6) where the square root is chosen so that msc(z) ∼ −z−1 as z → ∞. We define the resolvent of A through G(z) ..= (A − z)−1 , as well as the Stieltjes transform of the empirical eigenvalue density m(z) ..= 1 N Tr G(z) . For x ∈ R we define the distance κx to the spectral edge through κx ..= |x| − 2 . (3.7) At this point we warn the reader that we depart from our conventions in [11]. In that paper, the quantities G(z) and m(z) defined above in terms of A bore a tilde to distinguish them from the same quantities defined in terms of H. In this paper we drop the tilde, as we shall not need resolvents defined in terms of H. We shall frequently have to deal with events of very high probability, for which the following definition is useful. It is characterized by two positive parameters, ξ and ν, where ξ is given by (3.1). Definition 3.1 (High probability events). We say that an N -dependent event Ω holds with (ξ, ν)-high probability if P(Ωc) e−ν(log N )ξ (3.8) for N N0(ν). Similarly, for a given event Ω0, we say that Ω holds with (ξ, ν)-high probability on Ω0 if P(Ω0 ∩ Ωc) e−ν(log N )ξ for N N0(ν). Remark 3.2. In the following we shall not keep track of the explicit value of ν; in fact we allow ν to decrease from one line to another without introducing a new notation. All of our results will hold for ν ν0, where ν0 depends only on the constants C in Definition 2.1 and the parameter Σ in (3.2). 8 Theorem 3.3 (Local semicircle law [11]). Suppose that A satisfies Definition 2.2 with the condition (2.7) replaced with 0 f NC . (3.9) Moreover, assume that q (log N )120ξ , L 120ξ . (3.10) (3.11) Then there is a constant ν > 0, depending on Σ and the constants C in (2.4) and (2.5), such that the following holds. We have the local semicircle law: the event m(z) − msc(z) z∈DL (log N )40ξ min (√log N )40ξ κE + η 1 q2 , 1 q + 1 Nη (3.12) holds with (ξ, ν)-high probability. Moreover, we have the following estimate on the individual matrix elements of G. If instead of (3.9) f satisfies 0 f C0N 1/2 , (3.13) for some constant C0, then the event max Gij (z) − δij msc(z) 1 i,j N z∈DL holds with (ξ, ν)-high probability. (log N )40ξ 1 q + Im msc(z) Nη + 1 Nη (3.14) Next, we recall that the N − 1 first eigenvalues of A are close the their classical locations predicted by the semicircle law. Let nsc(E) ..= E −∞ ̺sc (x) dx denote the integrated density of the local semicircle law. Denote by γα the classical location of the α-th eigenvalue, defined through nsc(γα) = α N for α = 1, . . . , N . (3.15) The following theorem compares the locations of the eigenvalues µ1, . . . , µN−1 to their classical locations γ1, . . . , γN−1. Theorem 3.4 (Eigenvalue locations [11]). Suppose that A satisfies Definition 2.2, and let φ be an exponent satisfying 0 < φ 1/2, and set q = N φ. Then there is a constant ν > 0 – depending on Σ and the constants C in (2.4), (2.5), and (2.7) – as well as a constant C > 0 such that the following holds. We have with (ξ, ν)-high probability that N −1 |µα − γα|2 α=1 (log N )Cξ N 1−4φ + N 4/3−8φ . Moreover, for all α = 1, . . . , N − 1 we have with (ξ, ν)-high probability that (3.16) |µα − γα| (log N )Cξ N −2/3 α−1/3 + 1 α where we abbreviated α ..= min{α, N − α}. (log N )Cξ(1 + N 1−3φ) + N 2/3−4φα−2/3 + N −2φ , (3.17) 9 Remark 3.5. Under the assumption φ 1/3 the estimate (3.17) simplifies to |µα − γα| (log N )Cξ N −2/3α−1/3 + N −2φ , (3.18) which holds with (ξ, ν)-high probability. Finally, we record two basic results from [11] for later reference. From [11], Lemmas 4.4 and 6.1, we get, with (ξ, ν)-high probability, 1 max αN |λα | 2 + (log N )Cξ q−2 + N −2/3 , 1 mα aNx−1|µα| 2 + (log N )Cξ q−2 + N −2/3 . (3.19) Moreover, from [11], Theorem 6.2, we get, with (ξ, ν)-high probability, µN = f + 1 f + o(1) . (3.20) In particular, using (2.7) we get, with (ξ, ν)-high probability, 2 + σ µN N C , (3.21) where σ > 0 is a constant spectral gap depending only on the constant ε0 from (2.7). 4. Local ergodicity of the marginal Dyson Brownian motion In Sections 4 and 5 we give the proof of Theorem 2.5. Throughout Sections 4 and 5 it is convenient to adopt a slightly different notation for the eigenvalues of A. In these two sections we shall consistently use x1 · · · xN to denote the ordered eigenvalues of A, instead of µ1 · · · µN used in the rest of this paper. We abbreviate the collection of eigenvalues by x = (x1, . . . , xN ). The main tool in the proof of Theorem 2.5 is the marginal Dyson Brownian motion, obtained from the usual Dyson Brownian motion of the eigenvalues x by integrating out the largest eigenvalue xN . In this section we establish the local ergodicity of the marginal Dyson Brownian and derive an upper bound on its local relaxation time. Let A0 = (aij,0)ij be a matrix satisfying Definition 2.2 with constants q0 N φ and f0 1 + ε0. Let (Bij,t)ij be a symmetric matrix of independent Brownian motions, whose off-diagonal entries have variance t and diagonal entries variance 2t. Let the matrix At = (aij,t)ij satisfy the stochastic differential equation daij = d√Bij N − 1 2 aij dt . (4.1) It is easy to check that the distribution of At is equal to the distribution of e−t/2A0 + (1 − e−t)1/2V , (4.2) where V is a GOE matrix independent of A0. Let ρ be a constant satisfying 0 < ρ < 1 to be chosen later. In the following we shall consider times t in the interval [t0, τ ], where t0 ..= N −ρ−1 , τ ..= N −ρ . 10 One readily checks that, for any fixed ρ as above, the matrix At satisfies Definition 2.2, with constants ft = f (1 + O(N −δ0 )) 1 + ε0 2 , qt ∼ q0 N φ , where all estimates are uniform for t ∈ [t0, τ ]. Denoting by xN,t the largest eigenvalue of At, we get in particular from (3.21) that P ∃ t ∈ [t0, τ ] .. xN,t ∈/ [2 + σ, N C ] e−ν(log N )ξ (4.3) for some σ > 0 and C > 0. From now on we shall never use the symbols ft and qt in their above sense. The only information we shall need about xN is (4.3). In this section we shall not use any information about qt, and in Section 5 we shall only need that qt cN φ uniformly in t. Throughout this section ft will denote the joint eigenvalue density evolved under the Dyson Brownian motion. (See Definition 4.1 below.) It is well known that the eigenvalues xt of At satisfy the stochastic differential equation (Dyson Brownian motion) dxi = √dBi + N − 1 4 xi + 1 2N j=i 1 xi − xj dt for i = 1, . . . , N , (4.4) where B1, . . . , BN is a family of independent standard Brownian motions. In order to describe the law of V , we define the equilibrium Hamiltonian H(x) ..= 1 4 x2i − 1 N log|xi − xj| i i 0. Then for any ρ satisfying 0 < ρ < 1 there exists a τ¯ ∈ [τ /2, τ ] such that, for any J ⊂ {1, 2, . . . , N − mn − 1}, we have 1 |J | Gi,m fτ¯ dµ − i∈J 1 |J | Gi,m dµ(N−1) i∈J CNε N 1+ρQ + N ρ |J | (4.9) for all N N0(ρ). Here µ(N−1) is the equilibrium measure of (N − 1) eigenvalues (GOE). Note that, by definition, the observables Gi,m in (4.9) only depend on the eigenvalues x1, . . . , xN−1. The rest of this section is devoted to the proof of Theorem 4.2. We begin by introducing a pseudo equilibrium measure. Abbreviate R ..= √ τ N −ε = N −ρ/2−ε/2 and define W (x) ..= N 1 2R2 (xi − γi)2 . i=1 Here we set γN ..= 2 + σ for convenience, but one may easily check that the proof remains valid for any larger choice of γN . Define the probability measure ω(dx) ..= ψ(x) µ(dx) where ψ(x) ..= Z e−NW (x) . Z Next, we consider marginal quantities obtained by integrating out the largest eigenvalue xN . To that end we write x = (x, xN ) , x = (x1, . . . , xN−1) and denote by ω(dx) the marginal measure of ω obtained by integrating out xN . By a slight abuse of notation, we sometimes make use of functions µ, ω, and ω, defined as the densities (with respect to Lebesgue measure) of their respective measures. Thus, µ(x) = 1 Z e−N H(x) , ω(x) = 1 e−NH(x)−NW (x) , Z ∞ ω(x) = ω(x, xN ) dxN . xN −1 12 For any function h(x) we introduce the conditional expectation h (x) ..= Eω[h|x] = ∞ xN −1 h(x, xN ) ω(x, xN ) dxN ω(x) . Throughout the following, we write gt ..= ft/ψ. In order to avoid pathological behaviour of the extreme eigenvalues, we introduce cutoffs. Let σ be the spectral gap from (4.3), and choose θ1, θ2, θ3 ∈ [0, 1] to be smooth functions that satisfy θ1(x1) = θ2(xN−1) = θ3(xN ) = 0 if x1 −4 , 1 if x1 −3 1 if xN−1 0 if xN−1 2 + σ 5 2 + 2σ 5 , 0 if xN 1 if xN 2 + 3σ 5 2 + 4σ 5 . Define θ ≡ θ(x1, xN−1, xN ) = θ1(x1) θ2(xN−1) θ3(xN ). One easily finds that |∇θ|2 θ C1(−4 x1 −3) + C1 σ 2 xN−1 − 2 2σ 5 + C1 3σ 5 xN − 2 4σ 5 , (4.10) where the left-hand side is understood to vanish outside the support of θ. Define the density ht ..= 1 Zt θgt , Zt ..= θgt dω . If ν is a probability measure and q a density such that qν is also a probability measure, we define the entropy Sν(q) ..= q log q dν . The following result is our main tool for controlling the local ergodicity of the marginal Dyson Brownian motion. Proposition 4.3. Suppose that (i) Sµ(ft0 ) N C , (ii) sup t∈[t0,τ ] 1(x1 −3) + 1 xN−1 2 + σ 5 + 1 xN (iii) sup sup (θ1θ2)(x) log θgt (x) 2 t∈[t0,τ ] x∈ΣN −1 NC . 2 + 4σ 5 ft dµ e−ν(log N )ξ , (4.11) (4.12) (4.13) Then for t ∈ [t0, τ ] we have ∂tSω( h ) −Dω h + Sω( h ) e−c(log N)ξ + CN QR−4 + C . (4.14) 13 Proof. First we note that Zt = θft dµ = 1 − O e−ν(log N)ξ (4.15) uniformly for t ∈ [t0, τ ], by (4.12). Dropping the time index to avoid cluttering the notation, we find ∂tSω( h ) = ∂t θg log θg dω − ∂t log Z = 1 ∂t ZZ θg log θg dω − 1 + log Z + Sω( h ) ∂t log Z . We find that ∂tZ = θ(Lf ) dµ = − 1 2N ∇θ · ∇f dµ 1 N 1/2 |∇θ|2 f dµ Dµ( f )1/2 . Bounding the Dirichlet form in terms of the entropy (see e.g. [10], Theorem 3.2), we find that Dµ( ft) by (4.11). Using (4.10) we therefore find 2 t Sµ (ft0 ) NC , (4.16) ∂tZ N C e−c(log N )ξ . (4.17) Thus we have ∂tSω( h ) 2∂t θg log θg dω + 1 + Sω( h ) N C e−c(log N)ξ . (4.18) We therefore need to estimate ∂t θg log θg dω = The second term of (4.19) is given by θ(Lf ) log θg dµ + θg ∂t θg θg dω . (4.19) ∂t θg dω = θ(Lf ) dµ = ∂tZ . Therefore (4.18) yields ∂tSω( h ) 2 θ(Lf ) log θg dµ + (1 + Sω( h ))N C e−c(log N)ξ . The first term of (4.20) is given by − 1 N ∇f · ∇ θ log θg dµ = − 1 N ∇(θf ) · ∇ log θg dµ + E1 + E2 , where we defined E1 ..= 1 N ∇θ · ∇(log θg ) f dµ , E2 ..= − 1 N ∇θ · ∇f log θg dµ . (4.20) (4.21) 14 Next, we estimate the error terms E1 and E2. Using (4.10) we get E1 = 1 N ∇√θ · ∇(log θg √ ) θf dµ θ e−ν(log N )ξ |∇ θg |2 θg 2 θg dω |∇θ|2 θ f dµ 1/2 |∇ θg |2 θg 2 θf dµ 1/2 1/2 e−c(log N )ξ + e−c(log N )ξ Dω h, where we used (4.15). Similarly, we find E2 1/2 |∇θ|2 log θg 2 f dµ |∇f |2 f dµ 1/2 . Using (4.10), (4.13), and (4.16) we therefore get E2 N C |∇θ|2 θ θ log θg 2f dµ 1/2 N C e−c(log N )ξ . Having dealt with the error terms E1 and E2, we compute the first term on the right-hand side of (4.21), − 1 N ∇(θf ) · ∇ log θg dµ = − 1 N ∇x(θg) · ∇x log θg ψ dµ − 1 N ∇x(log ψ) · ∇x log θg θgψ dµ . (4.22) The second term of (4.22) is bounded by η−1 N |∇x log ψ|2 f dµ + η N |∇ θg |2 θg 2 θg dω where η > 0. The first term of (4.22) is equal to η−1N 1 R4 N −1 (xi − γi)2 f dµ + 4ηDω( i=1 NQ ηR4 + 4ηDω ( h ), h) A simple calculation shows that − 1 N ∇x(θg) · ∇x log θg dω . ∇x(θg) = ∇x θg − θg∇x log ω + θg ∇x log ω , so that the first term of (4.22) becomes − 1 N ∇x θg · ∇x log θg dω + 1 N θg∇x log ω − θg ∇x log ω · ∇x log θg dω −4(1 − η)Dω h + 1 Nη θg∇x log ω − θg θg ∇x log ω 2 dω . 15 Using the Cauchy-Schwarz inequality ab 2 a2 b2 we find that the second term is bounded by 1 θg ∇x log ω − ∇x log ω Nη θg Thus, we have to estimate 2 dω 1 Nη = 1 Nη = 1 Nη θg ∇x log ω − ∇x log ω 2 dω ∇x log ω − ∇x log ω 2 θf dµ N −1 i=1 xN 1 − xi − 1 xN − xi 2 θf dµ . E3 ..= 1 Nη N −1 i=1 1 xN − xi 2 θf dµ , E4 ..= 1 Nη N −1 i=1 1 xN − xi Since xN − xi σ/5 on the support of θf µ, one easily gets from (3.19) that E3 C η . 2 θf dµ . In order to estimate E4, we write 1 xN − xi = dxN (xN − xi) wi(xN ) dxN wi(xN ) −1 , where wi(xN ) ..= 1(xN x ) eN−1 − N 4 x2N − N 2R2 (xN −γN )2 (xN − xj ) . j=i,N We now claim that on the support of θ, in particular for −4 x1 < xN−1 2 + 2σ/5, we have dxN (xN − xi) wi(xN ) dxN wi(xN ) c γN , uniformly for x ∈ ΣN−1. Indeed, writing γN ..= γN (1 + R−2), we have on the support of θ (4.23) dxN (xN − xi) wi(xN ) dxN wi(xN ) γN /2 + dxN (xN − γN ) wi(xN dxN wi(xN ) ) . Moreover, the second term is nonnegative: ∞ dxN (xN − γN ) wi(xN ) = −CN (x) dxN xN −1 ∂ ∂xN e− N R2 (xN −γN )2 (xN − xj) j=i,N = CN (x) e− N R2 (xN −1−γN )2 (xN−1 − xj ) + CN (x) j=i,N ∞ dxN e− N R2 (xN −γN )2 (xN − xj) xN −1 k=i,j j=i,k,N 0, 16 where CN (x) is nonnegative. This proves (4.23). Using (4.23) we get E4 C η γN−2 f dµ = C η . Summarizing, we have proved that ∂tSω( h ) − 4 − 8η − e−c(log N)ξ Dω h + 1 + Sω( h ) e−c(log N )ξ + NQ ηR4 + C η . Choosing η small enough completes the proof. Next, we derive a logarithmic convexity bound for the marginal measure ω. Lemma 4.4. We have that where ω(x) = 1 e−NH(x) , Z H(x) = − 1 N log|xi − xj | + V (x) , i 0 and define, for x ∈ ΣN−1, Vδ (x) ..= − 1 N log exp logδ (xN − xi) − N 2R2 (xi − γi)2 dxN , i 0 −∞ if x 0 . 17 Thus we find that Vδ ∈ C2(ΣN−1) and that we have the pointwise convergence, for all x ∈ ΣN−1, lim Vδ(x) = δ→0 V (x) ..= − 1 N log ∞ e−N H′′(x,xN ) dxN , xN −1 where V ∈ C2(ΣN−1) satisfies (4.24). Next, we claim that if ϕ = ϕ(x, y) satisfies ∇2ϕ(x, y) K then ψ(x), defined by e−ψ(x) ..= e−ϕ(x,y) dy , satisfies ∇2ψ(x) K. In order to prove the claim, we use subscripts to denote partial derivatives and recall the Brascamp-Lieb inequality for log-concave functions (Equation 4.7 in [6]) ψxx ϕxx − ϕxyϕ−yy1ϕyx e−ϕ dy e−ϕ dy . Then the claim follows from ϕxx ϕyz ϕxy −1 ϕyy 1 K =⇒ ϕxx − ϕxyϕ−yy1ϕyx K. Using this claim, we find that ∇2Vδ(x) R−2 for all x ∈ ΣN−1. In order to prove that ∇2V (x) R−2 – and hence complete the proof – it suffices to consider directional derivatives and prove the following claim. If (ζδ)δ>0 is a family of functions on a neighbourhood U that converges pointwise to a C2-function ζ as δ → 0, and if ζδ′′(x) K for all δ > 0 and x ∈ U , then ζ′′(x) K for all x ∈ U . Indeed, taking δ → 0 in h ζδ(x + h) + ζδ(x − h) − 2ζδ(x) = ζδ′′(x + ξ) + ζδ′′(x − ξ) (h − ξ) dξ 0 K h2 yields ζ(x + h) + ζ(x − h) − 2ζ(x) h−2 K, from which the claim follows by taking the limit h → 0. As a first consequence of Lemma 4.4, we derive an estimate on the expectation of observables depending only on eigenvalue differences. Proposition 4.5. Let q ∈ L∞(dω) be probability density. Then for any J ⊂ {1, 2, . . . , N − mn − 1} and any t > 0 we have 1 |J | Gi,m q dω − i∈J 1 |J | Gi,m dω i∈J C Dω(√q) t |J | + C Sω(q) e−ct/R2 . Proof. Using Lemma 4.4, the proof of Theorem 4.3 in [16] applies with merely cosmetic changes. Another, standard, consequence of Lemma 4.4 is the logarithmic Sobolev inequality Sω(q) CR2Dω(√q) . (4.25) Using (4.25) and Proposition 4.3, we get the following estimate on the Dirichlet form. 18 Proposition 4.6. Under the assumptions of Proposition 4.3, there exists a τ¯ ∈ [τ /2, τ ] such that Sω( hτ¯ ) CN R−2Q + CR2 , Dω( hτ¯ ) CN R−4Q + C . Proof. Combining (4.25) with (4.14) yields ∂tSω( ht ) −CR−2Sω( ht ) + CN QR−4 + C , which we integrate from t0 to t to get Sω( ht ) e−CR−2(t−t0)Sω( ht0 ) + CN QR−2 + CR2 . Moreover, (4.15) yields (4.26) (4.27) Sω( ht0 ) CSω ( gt0 ) + e−ν(log N)ξ CSω(gt0 ) + e−ν(log N)ξ = CSµ(ft0 ) − C log ψ ft0 dµ + e−ν(log N)ξ , where the second inequality follows from the fact that taking marginals reduces the relative entropy; see the proof of Lemma 4.7 below for more details. Thus we get Sω( ht0 ) N C + N R−2Q N C . Thus (4.27) yields Sω( ht ) N C e−CR−2(t−t0) + CN R−2Q + CR2 (4.28) for t ∈ [t0, τ ]. Integrating (4.14) from τ /2 to τ therefore gives 2 τ τ Dω ( τ /2 ht ) dt CN R−4Q + C , and the claim follows. We may finally complete the proof of Theorem 4.2. Proof of Theorem 4.2. The assumptions of Proposition 4.3 are verified in Subsection 4.1 below. Hence Propositions 4.5 and 4.6 yield 1 |J | Gi,m hτ¯ dω − i∈J Using (4.15) and (4.12) we get 1 |J | Gi,m dω i∈J CNε N 1+ρQ |J | + C N −2φ−ρ |J | . 1 |J | Gi,m fτ¯ dµ − i∈J 1 |J | Gi,m dω i∈J CNε N 1+ρQ |J | + C N −2φ−ρ |J | . (4.29) 19 In order to compare the measures ω and µ(N−1), we define the density q(x) ..= 1 Z′ exp 1 4 x2i + N 2R2 (xi − γi)2 − log|xN − xi| , i 0 we have Sµ(ft) N 2(N m2(ζ0) − log 1 − e−t) , where m2(ζ0) is the second moment of ζ0. Proof. Recall that the relative entropy is defined, for ν ≪ µ, as S(ν|µ) ..= log dν dµ dν. If ν and µ are marginals of ν and µ with respect to the same variable, it is easy to check that S(ν|µ) S(ν|µ). Therefore Sµ(ft) = S(ftµ|µ) S(At|V ) = N 2S(ζt|g2/N ) , 20 where ζt denotes the law of the off-diagonal entries of At, and gλ is a standard Gaussian with variance λ (the diagonal entries are dealt with similarly). Setting γ = 1 − e−t, we find from (4.2) that ζt has probability density ̺γ ∗ g2γ/N , where ̺γ is the probability density of (1 − γ)1/2ζ0. Therefore Jensen’s inequality yields S(ζt|g2/N ) = S dy ̺γ(y) g2γ/N (· − y) g2/N dy ̺γ (y)S g2γ/N (· − y)|g2/N . By explicit computation one finds S g2γ/N (· − y)|g2/N = 1 2 N 2 y2 − log γ + γ − 1 . Therefore and the claim follows. S(ζt|g2/N ) N m2(ζ0) − log γ , The estimate (4.12) follows from (4.3) and (3.19). It only remains to verify (4.13). Lemma 4.8. For any t ∈ [t0, τ ] we have (θ1θ2)(x) log θgt (x) 2 N C . (4.30) Proof. Let ζt be the law of an off-diagonal entry a of At (the diagonal entries are treated similarly). From (4.2) we find ζt = ̺γ ∗ g2γ/N , where γ = 1 − e−t, ̺γ is the law of (1 − γ)1/2ζ0, and gλ is a standard Gaussian with variance λ. Using da to denote Lebesgue measure, we find by explicit calculation that which gives e−N C −N C a2 dζt eN C − N 4 a2 , da e−N C −N C a2 dζt dg2γ/N eNC . Therefore, the density Ft(A) of the law of A with respect to the GOE measure satisfies e−N C −N C Tr A2 Ft(A) eNC . Parametrizing A = A(x, v) using the eigenvalues x and eigenvectors v, the GOE measure can be written in the factorized form µ(dx)P (dv), where µ is defined in (4.6) and P is a probability measure. Thus we get that the density ft(x) = Ft(x, v) P (dv) 21 satisfies e−N C −N C i x2i ft(x) eNC . Next, it is easy to see that e−N C −N C i x2i ψ(x) eNC . Using (4.32) we may now derive an upper bound on θ3gt : θ3gt (x) = dxN θ3(xN )ft(x, xN )µ(x, xN ) dxN ψ(x, xN )µ(x, xN ) eN C +N C i 0 and let b≡ bN satisfy |b| α ≡ α(φ) ..= min 2φ − 1 2 , 4φ − 2 3 . κ/2. Pick ε, β > 0, and (5.1) Let n ∈ N and O : Rn → R be compactly supported and continuous. Then there is a τ¯ ∈ [τ /2, τ ] such that E+b dE′ E−b 2b dα1 · · · dαn O(α1, . . . , αn) 1 ̺sc(E)n pτ(¯n,N) − p(GnO) E,N E′ + N α1 ̺sc(E) , . . . , E′ + αn N ̺sc(E) CnN ε b−1N −2φ + b−1/2N −β/2 . (5.2) Proof. The claim follows from Theorem 4.2 and Theorem 3.4, similarly to the proof of Theorem 2.1 in [16]. We use that Q (log N )CξN −2α, as follows from (3.16); the contribution of the low probability complement event to (3.16) may be easily estimated using Cauchy-Schwarz and the estimate i Etx4i = Et Tr A4 N C , uniformly for t 0. The assumption IV of [16] is a straightforward consequence of the local semicircle law, Theorem 3.3. Proposition 5.2. Let A(1) = (a(ij1)) and A(2) = (a(ij2)) be sparse random matrices, both satisfying Definition 2.2 with q(1) ∼ q(2) Nφ (in self-explanatory notation). Suppose that, for each i, j, the first three moments of a(ij1) and a(ij2) are the same, and that the fourth moments satisfy E a(ij1) 4 − E a(ij2) 4 N −2−δ , (5.3) for some δ > 0. Let n ∈ N and let F ∈ C5(Cn). We assume that, for any multi-index α ∈ Nn with 1 |α| 5 and any sufficiently small ε′ > 0, we have max ∂αF (x1, . . . , xn) .. |xi| N ε′ i N C0ε′ , max ∂αF (x1, . . . , xn) .. |xi| N 2 i N C0 , where C0 is a constant. Let κ > 0 be arbitrary. Choose a sequence of positive integers k1, . . . , kn and real parameters Ejm ∈ [−2 + κ, 2 − κ], where m = 1, . . Set zjm ..= Ejm ± iη with . , n and j = an arbitrary 1, . . . , km. Let ε choice of the ± > 0 be signs. arbitrary and choose η with N −1−ε η N −1. Then, abbreviating G(l)(z) ..= (A(l) − z)−1, we have     EF 1 N k1 Tr  k1 G(1)(zj1) , . . . , 1 N kn kn Tr  G(1)(zjn) − EF G(1) → G(2) j=1 j=1 CN 1−3φ+Cε + CN −δ+Cε . 23 Proof. The proof of Theorem 2.3 in [17] may be reproduced almost verbatim; the rest term in the Green function expansion is estimated by an L∞-L1 bound using E|a(ijl)|5 CN −1−3φ. As in [17] (Theorem 6.4), Proposition 5.2 readily implies the following correlation function comparison theorem. Theorem 5.3. Suppose the assumptions of Proposition 5.2 hold. Let p((n1)),N and p((n2)),N be n-point correlation functions of the eigenvalues of A(1) and A(2) respectively. Then for any |E| < 2, any n 1 and any compactly supported test function O : Rn → R we have lim N →∞ dα1 · · · dαn O(α1, . . . , αn) p((n1)),N − p((n2)),N E + α1 N ,. . . ,E + αn N = 0. We may now complete the proof of Theorem 2.5. Proof of Theorem 2.5. In order to invoke Theorems 5.1 and 5.3, we construct a sparse matrix A0, satisfying Definition 2.2, such that its time evolution Aτ¯ is close to A in the sense of the assumptions of Proposition 5.2. For definiteness, we concentrate on off-diagonal elements (the diagonal elements are dealt with similarly). For the following we fix i < j; all constants in the following are uniform in i, j, and N . Let ξ, ξ′, ξ0 be random variables equal in distribution to aij, (aτ¯)ij , (a0)ij respectively. For any random variable X we use the notation X ..= X − EX. Abbreviating γ ..= 1 − e−τ¯, we have ξ′ = 1 − γ ξ0 + √γ g , where g is a centred Gaussian with variance 1/N , independent of ξ0. We shall construct a random variable ξ0, supported on at most three points, such that A0 satisfies Definition 2.2 and the first four moments of ξ′ are sufficiently close to those of ξ. For k = 1, 2, . . . we denote by mk(X) the k-th moment of a random variable X. We set ξ0 = √ 1 f − γ N + ξ0 , (5.4) where m1(ξ0) = 0 and m2(ξ0) = N −1. It is easy to see that mk(ξ) = mk(ξ′) for k = 1, 2. We take the law of ξ0 to be of the form pδa + qδ−b + (1 − p − q)δ0 where a, b, p, q 0 are parameters satisfying p + q 1. The conditions m1(ξ0) = 0 and m2(ξ0) = N −1 imply p = 1 aN (a + b) , q = bN 1 (a + b) . Thus, we parametrize ξ0 using a and b; the condition p + q 1 reads ab N −1. Our aim is to determine a and b so that ξ0 satisfies (2.4), and so that the third and fourth moments of ξ′ and ξ are close. By explicit computation we find m3(ξ0) = a− N b , m4(ξ0) = N m3(ξ0)2 + ab N . (5.5) 24 Now we require that a and b be chosen so that ab N −1 and m3(ξ0) = (1 − γ)−3/2m3(ξ) , m4(ξ0) = N m3(ξ0)2 + m4(ξ) − N m3(ξ)2 . Using (5.5), it is easy to see that such a pair (a, b) exists provided that m4(ξ) − N m3(ξ)2 N −2. This latter estimate is generally valid for any random variable with m1 = 0; it follows from the elementary inequality m4m2 − m23 m32 valid whenever m1 = 0. Next, using (5.5) and the estimates m3(ξ) = O(N −1−φ), m4(ξ) = O(N −1−2φ), we find a − b = O(N −φ) , ab = O(N −2φ) , which implies a, b = O(N −φ). We have hence proved that A0 satisfies Definition 2.2. One readily finds that m3(ξ′) = m3(ξ). Moreover, using m4(ξ0) − m4(ξ) = N m3(ξ)2 (1 − γ)−3 − 1 = O(N −1−2φγ) , we find m4(ξ′) − m4(ξ) = (1 − γ)2m4(ξ0) + 6γ N2 − 3γ2 N2 − m4(ξ) = O(N −1−2φγ) . Summarizing, we have proved mk(ξ′) = mk(ξ) (k = 1, 2, 3), |m4(ξ′) − m4(ξ)| CN −1−2φτ¯ . (5.6) The claim follows now by setting δ = 2α(φ) + 2φ − 1 − β in (5.3), and invoking Theorems 5.1 and 5.3. 6. Edge universality: proof of Theorem 2.7 6.1. Rank-one perturbations of the GOE. We begin by deriving a simple, entirely deterministic, result on the eigenvalues of rank-one perturbations of matrices. We choose the perturbation to be proportional to |e e|, but all results of this subsection hold trivially if e is replaced with an arbitrary ℓ2-normalized vector. Lemma 6.1 (Monotonicity and interlacing). Let H be a symmetric N × N matrix. For f 0 we set A(f ) ..= H + f |e e| . Denote by λ1 · · · λN the eigenvalues of H, and by µ1(f ) · · · µN (f ) the eigenvalues of A(f ). Then for all α = 1, . . . , N − 1 and f 0 the function µα(f ) is nondecreasing, satisfies µα(0) = λα, and has the interlacing property λα µα(f ) λα+1 . (6.1) Proof. From [11], Equation (6.3), we find that µ is an eigenvalue of H + f |e e| if and only if α | uα , e |2 µ − λα = 1 f , (6.2) where uα is the eigenvector of H associated with the eigenvalue λα. The right-hand side of (6.2) has N singularities at λ1, . . . , λN , away from which it is decreasing. All claims now follow easily. 25 Next, we establish the following “eigenvalue sticking” property for GOE. Let α label an eigenvalue close to the right (say) spectral edge. Roughly we prove that, in the case where H = V is a GOE matrix and f > 1, the eigenvalue µα of V + f |e e| “sticks” to λα+1 with a precision (log N )CξN −1. This behaviour can be interpreted as a form of long-distance level repulsion, in which the eigenvalues µβ, β < α, repel the eigenvalue µα and push it close to its maximum possible value, λα+1. Lemma 6.2 (Eigenvalue sticking). Let V be an N × N GOE matrix. Suppose moreover that ξ satisfies (3.10) and that f satisfies f 1 + ε0. Then there is a δ ≡ δ(ε0) > 0 such that for all α satisfying N (1 − δ) α N − 1 we have with (ξ, ν)-high probability Similarly, if α instead satisfies α |λα+1 − µα| (log N )Cξ N . N δ we have with (ξ, ν)-high probability (6.3) |µα − λα| (log N )Cξ N . (6.4) For the proof of Lemma 6.2 we shall need the following result about Wigner matrices, proved in [19]. Lemma 6.3. Let H be a Wigner matrix with eigenvalues λ1 · · · λN and associated eigenvectors u1, . . . , uN . Assume that ξ is given by (3.1). Then the following two statements hold with (ξ, ν)-high probability: max α uα ∞ (log√N )Cξ , N (6.5) and |λα − γα| (log N )CξN −2/3 min{α, N + 1 − α} −1/3 . (6.6) Moreover, let L satisfy (3.11) and write GHij (z) ..= (H − z)−1 ij . Then we have, with (ξ, ν)-high probability, 1 max i,j N GHij (z) − δij msc(z) z∈DL (log N )Cξ Im msc(z) Nη + 1 Nη , (6.7) where DL was defined in (3.2) Proof of Lemma 6.2. We only prove (6.3); the proof of (6.4) is analogous. By orthogonal invariance of V , we may replace e with the vector (1, 0, . . . , 0). Let us abbreviate ζβ ..= |uβ(1)|2. Note that (6.5) implies max ζβ β (log N )CξN −1 (6.8) with (ξ, ν)-high probability. Now from we (6.2) we get µα ζα − λα+1 + β=α+1 ζβ µα − λβ = 1 f , which yields |λα+1 − µα| = ζα ζβ β=α+1 λβ − µα + 1 f −1 . (6.9) 26 We estimate from below, introducing an arbitrary η > 0, − β=α+1 λβ ζβ − µα = ζβ β<α+1 µα − λβ − β>α+1 λβ ζβ − µα β<α+1 ζβ(µα − λβ) (µα − λβ )2 + η2 − β>α+1 ζβ λβ − µα = − Re GV11(µα + iη) + β>α+1 ζβ(λβ − µα) (λβ − µα)2 + η2 − β>α+1 λβ ζβ − µα − Re GV11(µα + iη) − β>α+1 ζβ η2 (λβ − µα)3 , (6.10) where in the third step we used that λα+1 µα by (6.1). We now choose η = (log N )C1 log log N N −1. For C1 large enough, we get from (6.7) that GV11(µα + iη) = msc(µα + iη) + o(1). Therefore (3.6) yields − Re GV11(µα + iη) 1 − 2 |2 − µα| + o(1) . (6.11) From (6.6) and (6.1) we get that |µα − γα| (log N )CξN −2/3 with (ξ, ν)-high probability. Moreover, the definition (3.15) and α N (1 − δ) imply |γα − 2| Cδ2/3. Thus we get, with (ξ, ν)-high probability, that |2 − µα| = o(1) + Cδ2/3. Therefore (6.11) yields, with (ξ, ν)-high probability, − Re GV11(µα + iη) 1 + o(1) − Cδ1/3 . Recalling (6.8), we therefore get from (6.10), with (ξ, ν)-high probability, ζβ β=α+1 λβ − µα 1 + o(1) − C δ 1/3 − m(log N )Cξ N 3|λα+1 − µα|3 − (log N )Cξ N3 β>α+m |λβ 1 − µα|3 , (6.12) for any m ∈ N. Next, from (6.6) we find that, provided C2 is large enough, m ..= (log N )C2ξ, and β > α + m, then we have with (ξ, ν)-high probability |λβ − λα+1| |γβ − γα+1 | − (log N )Cξ N 2/3(N + 1 − β)1/3 c|γβ − γα+1| . Then for C2 large enough we have, with (ξ, ν)-high probability, 1 β>α+m |λβ − µα|3 C β>α+m (γβ 1 − γα+1)3 CN3 (log N )3C2ξ . Thus we get from (6.12), with (ξ, ν)-high probability, ζβ β=α+1 λβ − µα 1 + o(1) − C δ 1/3 − (log N )Cξ N 3|λα+1 − µα|3 . 27 Plugging this into (6.9) and recalling that f |λα+1 − µα| (log N )Cξ N 1 + ε0 > 1 yields, with (ξ, ν)-high probability, ε0 − C δ 1/3 − o(1) − (log N )Cξ N 3|λα+1 − µα|2 −1 , from which the claim follows. 6.2. Proof of Theorem 2.7. In this section we prove Theorem 2.7 by establishing the following comparison result for sparse matrices. Throughout the following we shall abbreviate the lower bound in (2.7) by f∗ ..= 1 + ε0 . (6.13) Proposition 6.4. Let Pv and Pw be laws on the symmetric N × N random matrices H, each satisfying Definition 2.1 with q N φ for some φ satisfying 1/3 < φ 1/2. In particular, we have the moment matching condition Evhij = Ewhij = 0 , Ev h2ij = Ew h2ij = 1 N . (6.14) Set f ..= f∗ in Definition 2.2: A ≡ A(f∗) = (aij ) ..= H +f∗|e e|. As usual, we denote the ordered eigenvalues of H by λ1 · · · λN and the ordered eigenvalues of A by µ1 · · · µN . Then there is a δ > 0 such that for any s ∈ R we have Pv N 2/3(λN − 2) s − N −δ − N −δ Pw N 2/3(λN − 2) s Pv N 2/3(λN − 2) s + N −δ + N −δ (6.15) as well as Pv N 2/3(µN−1 − 2) s − N −δ − N −δ Pw N 2/3(µN−1 − 2) s Pv N 2/3(µN−1 − 2) s + N −δ + N −δ (6.16) for N N0 sufficiently large, where N0 is independent of s. Assuming Proposition 6.4 is proved, we may easily complete the proof of Theorem 2.7 using the results of Section 6.1. Proof of Theorem 2.7. Choose Pv to be the law of GOE (see Remark 2.4), and choose Pw to be the law of a sparse matrix satisfying Definition 2.1 with q N φ. We prove (2.11); the proof of (2.12) is similar. For the following we write µα(f ) ≡ µα to emphasize the f -dependence of the eigenvalues of A(f ). Using first (6.1) and then (6.15) we get Pw N 2/3(µN−1(f ) − 2) s Pw N 2/3(λN − 2) s Pv N 2/3(λN − 2) s − N −δ − N −δ , for some δ > 0. Next, using first the monotonicity of µα(f ) from Lemma 6.1, then (6.16), and finally (6.3), we get Pw N 2/3(µN−1(f ) − 2) s Pw N 2/3(µN−1(f∗) − 2) s Pv N 2/3(µN−1(f∗) − 2) s + N −δ + N −δ Pv N 2/3(λN − 2) for some δ > 0. This concludes the proof of (2.11), after a renaming of δ. s + 2N −δ + 2N −δ , 28 The rest of this section is devoted to the proof of Proposition 6.4. We shall only prove (6.16). The proof of (6.15) is similar (in fact easier), and relies on the local semicircle law, Theorem 3.3, with f = 0; if f = 0 some of the following analysis simplifies (e.g. the proof of Lemma 6.8 below may be completed without the estimate from Lemma 6.9.) From now on we always assume the setup of Proposition 6.4. In particular, f will always be equal to f∗. We begin by outlining the proof of Proposition 6.4. The basic strategy is similar to the one used for Wigner matrices in [19] and [25]. For any E1 E2, let N (E1, E2) ..= {α .. E1 µα E2} denote the number of eigenvalues of A in the interval [E1, E2]. In the first step, we express the distribution function in terms of Green functions according to E∗ Pu µN−1 E = EuK N (E, ∞) − 2 ≈ EuK N (E, E∗) − 1 ≈ EuK dy N Im m(y + iη) − 1 . E (6.17) Here u stands for either v or w, η ..= N −2/3−ε for some ε > 0 small enough, K : R → R+ is a smooth cutoff function satisfying K(x) = 1 if |x| 1/9 and K(x) = 0 if |x| 2/9 , (6.18) and E∗ ..= 2 + 2(log N )C0ξN −2/3 (6.19) for some C0 large enough. The first approximate identity in (6.17) follows from Theorem 3.4 which guarantees that µN−1 E∗ with (ξ, ν)-high probability, and from (3.21) which guarantees that µN 2 + σ with (ξ, ν)high probability. The second approximate identity in (6.17) follows from the approximation E2 dy N Im m(y + iη) = E1 α E2 E1 dy (y − η µα)2 + η2 ≈ N (E1, E2) , which is valid for E1 and E2 near the spectral edge, where the typical eigenvalue separation is N −2/3 ≫ η. The second step of our proof is to compare expressions such as the right-hand side of (6.17) for u = v and u = w. This is done using a Lindeberg replacement strategy and a resolvent expansion of the argument of K. This step is implemented in Section 6.3, to which we also refer for a heuristic discussion of the argument. Now we give the rigorous proof of the steps outlined in (6.17). We first collect the tools we shall need. From (3.18) and (3.21) we get that there is a constant C0 > 0 such that, under both Pv and Pw, we have with (ξ, ν)-high probability |N 2/3(µN−1 − 2)| (log N )C0ξ , µN 2 + σ , (6.20) and N 2− 2(log N N )C0 2/3 ξ , 2+ 2(log N )C0ξ N 2/3 Therefore in (6.16) we can assume that s satisfies (log N )2C0ξ . (6.21) − (log N )C0ξ s (log N )C0ξ. (6.22) 29 Recall the definition (6.19) of E∗ and and introduce, for any E interval [E, E∗], χE ..= 1[E,E∗] . E∗, the characteristic function on the For any η > 0 we define θη (x) ..= η π(x2 + η2) = 1 π Im x 1 − iη (6.23) to be an approximate delta function on scale η. The following result allows us to replace the sharp counting function N (E, E∗) = Tr χE(H) with its approximation smoothed on scale η. Lemma 6.5. Suppose that E satisfies |E − 2|N 2/3 (log N )C0ξ. (6.24) Let ℓ ..= 1 2 N −2/3−ε and η ..= N −2/3−9ε, and recall the definition of the function K from (6.18). Then the following statements hold for both ensembles Pv and Pw. For some ε > 0 small enough the inequalities Tr(χE+ℓ ∗ θη)(H) − N −ε N (E, ∞) − 1 Tr(χE−ℓ ∗ θη)(H) + N −ε (6.25) hold with (ξ, ν)-high probability. Furthermore, we have E K Tr(χE−ℓ ∗ θη)(H) P(N (E, ∞) = 1) E K Tr(χE+ℓ ∗ θη)(H) + e−ν(log N)ξ (6.26) for sufficiently large N independent of E, as long as (6.24) holds. Proof. The proof of Corollary 6.2 in [19] can be reproduced almost verbatim. In the estimate (6.17) of [19], we need the bound, with (ξ, ν)-high probability, m(E + iℓ) − msc(E + iℓ) (log N )Cξ Nℓ for N −1+c ℓ N −2/3. This is an easy consequence of the local semicircle law (3.12) and the assumption q N 1/3. Note that, when compared to Corollary 6.2 in [19], the quantity N (E, ∞) has been incremented by one; the culprit is the single eigenvalue µN 2 + σ. Recalling that θη (H ) = 1 π Im G(iη), Lemma 6.5 bounds the probability of N (E, ∞) = 1 in terms of expectations of functionals of Green functions. We now show that the difference between the expectations of these functionals, with respect to the two probability distributions Pv and Pw, is negligible assuming their associated second moments of hij coincide. The precise statement is the following Green function comparison theorem at the edge. All statements are formulated for the upper spectral edge 2, but with the same proof they hold for the lower spectral edge −2 as well. For the following it is convenient to introduce the shorthand Iε ..= {x .. |x − 2| N −2/3+ε} (6.27) where ε > 0. 30 Proposition 6.6 (Green function comparison theorem on the edge). Suppose that the assumptions of Proposition 6.4 hold. Let F : R → R be a function whose derivatives satisfy sup |F (n)(x)|(1 + |x|)−C1 C1 for n = 1, 2, 3, 4 , x (6.28) with some constant C1 > 0. Then there exists ε < ε and for any real numbers E, E1, E2 ∈ Iε, a constant and setting ε η > 0, depending only on ..= N −2/3−ε, we have C1, such that for any EvF (N η Im m(z)) − EwF (N η Im m(z)) CN 1/3+Cεq−1 for z = E + iη , (6.29) and E2 E2 EvF N dy Im m(y + iη) − EwF N dy Im m(y + iη) E1 E1 N 1/3+Cεq−1 (6.30) for some constant C and large enough N . We postpone the proof of Proposition 6.6 to the next section. Assuming it proved, we now have all the ingredients needed to complete the proof of Proposition 6.4. Proof of Proposition 6.4. As observed after (6.20) and (6.21), we may assume that (6.22) holds. We define E := 2 + sN −2/3 that satisfies (6.24). We define E∗ as in (6.19) with the C0 such that (6.20) and (6.21) hold. From (6.26) we get, for any sufficiently small ε > 0, Ew K Tr(χE−ℓ ∗ θη)(H) Pw(N (E, ∞) = 1) (6.31) where we set ℓ ..= 1 2 N −2/3−ε , η ..= N −2/3−9ε. Now (6.30) applied to the case E1 = E − ℓ and E2 = E∗ shows that there exists a δ > 0 such that for sufficiently small ε > 0 we have EvK Tr(χE−ℓ ∗ θη)(H) Ew K Tr(χE−ℓ ∗ θη)(H) + N −δ (6.32) (note that here 9ε plays the role of ε in the Proposition 6.6). Next, the second bound of (6.26) yields Pv(N (E − 2ℓ, ∞) = 1) Ev K Tr(χE−ℓ ∗ θη)(H) + e−ν(log N)ξ (6.33) Combining these inequalities, we have Pv(N (E − 2ℓ, ∞) = 1) Pw(N (E, ∞) = 1) + 2N −δ (6.34) for sufficiently small ε > 0 and sufficiently large N . Setting E = 2 + sN −2/3 proves the first inequality of (6.16). Switching the roles of v and w in (6.34) yields the second inequality of (6.16). 31 6.3. Proof of Proposition 6.6. All that remains is the proof of Proposition 6.6, to which this section is devoted. Throughout this section we suppose that the assumptions of Proposition 6.4 hold, and in particular that f = 1 + ε0. We now set up notations to replace the matrix elements one by one. This step is identical for the proof of both (6.29) and (6.30); we use the notations of the case (6.29), for which they are less involved. For the following it is convenient to slightly modify our notation. We take two copies of our probability space, one of which carries the law Pv and the other the law Pw. We work on the product space and write Hv for the copy carrying the law Pv and Hw for the copy carrying the law Pw. The matrices Av and Aw are defined in the obvious way, and we use the notations Av = (vij ) and Aw = (wij ) for their entries. Similarly, we denote by Gv(z) and Gw(z) the Green functions of the matrices Av and Aw. Fix a bijective ordering map on the index set of the independent matrix elements, φ .. {(i, j) .. 1 i j N } → 0, . . . , γmax where γmax ..= N (N + 2 1) − 1 , and denote by Aγ the generalized Wigner matrix whose matrix elements aij follow the v-distribution if φ(i, j) γ and they follow the w-distribution otherwise; Next, set η ..= N −2/3−ε. We use the identity in particular A0 = Av and Aγmax = Aw. Im msc(E + iη) |E − 2| + η CN −1/3+ε/2 . (6.35) Therefore Theorem 2.9 of [11] yields, with (ξ, ν)-high probability, max max max 0 γ γmax 1 k,l N E∈Iε 1 Aγ − E − iη − δklmsc(E + iη) kl 1 p where we defined 1 p ..= Nε q−1 + 1 Nη N −1/3+2ε . We set z = E + iη where E ∈ Iε and η = N −2/3−ε. Using (6.36), (6.37), and the identity Im m = 1 N Im Tr G = η N Gij Gij , ij (6.36) (6.37) we find, as in (6.36) of [19], that in order to prove (6.29) it is enough to prove  EF η2 Gvij Gvji − EF (Gv → Gw) i=j CN 1/3+Cεq−1 (6.38) at z = E + iη. We write the quantity in the absolute value on the left-hand side of (6.38) as a telescopic sum,  E F η2 i=j 1 Av − z ij 1 Av − z  − E F (Av → Aw) ji γmax =− E F (Av → Aγ ) − E F (Av → Aγ−1) . (6.39) γ =2 32 Let E(ij) denote the matrix whose matrix elements are zero everywhere except at the (i, j) position, where it is 1, i.e. Ek(ilj) ..= δikδjl. Fix γ 1 and let (b, d) be determined by φ(b, d) = γ. For definiteness, we assume the off-diagonal case b = d; the case b = d can be treated similarly. Note that the number of diagonal terms is N and the number of off-diagonal terms is O(N 2). We shall compare Aγ−1 with Aγ for each γ and then sum up the differences in (6.39). Note that these two matrices differ only in the entries (b, d) and (b, d), and they can be written as Aγ−1 = Q + V where V ..= (vbd − Evbd)E(bd) + (vdb − Evdb)E(db) , (6.40) and Aγ = Q + W where the matrix Q satisfies where W ..= (wbd − Ewbd)E(bd) + (wdb − Ewdb)E(db) , Qbd = Qdb = f /N = Evbd = Evdb = Ewbd = Ewdb , where, we recall f = 1 + ε0. It is easy to see that max i,j |vij | + max i,j |wij | (log N )Cξq−1 (6.41) with (ξ, ν)-high probability, and that Evij = Ewij = 0 , E(vij )2 = E(wij )2 C/N , E|vij |k + E|wij |k for k = 2, 3, 4, 5, 6. We define the Green functions R ..= Q 1 − z , S ..= 1 Aγ−1 − z , T ..= 1 Aγ − z . We now claim that the estimate (6.36) holds for the Green function R as well, i.e. max max Rkl(E + iη) − δklmsc(E + iη) 1 k,l N E∈Iε p−1 holds with (ξ, ν)-high probability. To see this, we use the resolvent expansion CN −1q2−k (6.42) (6.43) (6.44) R = S + SV S + (SV )2S + . . . + (SV )9S + (SV )10R. (6.45) Since V has only at most two nonzero elements, when computing the entry (k, l) of this matrix identity, each term is a sum of finitely many terms (i.e. the number of summands is independent of N ) that involve matrix elements of S or R and vij, e.g. of the form (SV S)kl = Skivij Sjl + SkjvjiSil. Using the bound (6.36) for the S matrix elements, the bound (6.41) for vij and the trivial bound |Rij| η−1 N , we get (6.44). Having introduced these notations, we may now give an outline of the proof of Proposition 6.6. We have to estimate each summand of the telescopic sum (6.39) with b = d (the generic case) by o(N −2); in the non-generic case b = d, a bound of size o(N −1) suffices. For simplicity, assume that we are in the generic case b = d and that F has only one argument. Fix z = E + iη, where E ∈ Iε (see (6.27)) and η ..= N −2/3−ε. Define yS ..= η2 Sij (z)Sji(z) ; (6.46) i=j 33 the random variable yR is defined similarly. We shall show that EF (yS) = B + EF (yR) + O(N −1/3+Cεp−4q−1) , (6.47) for some deterministic B which depends only on the law of Q and the first two moments of vbd. From (6.47) we immediately conclude that (6.29) holds. In order to prove (6.47), we expand F (yS) − F (yR) = F ′(yR)(yS − yR) + 1 2 F ′′ (yR )(yS − yR)2 + 1 6 F ′′′ (ζ )(yS − yR)3 , (6.48) where ζ lies between yS and yR. Next, we apply the resolvent expansion S = R + RV R + (RV )2R + . . . + (RV )mR + (RV )m+1S (6.49) to each factor S in (6.48) for some m 2. Here we only concentrate on the linear term in (6.48). The second term is dealt with similarly. (The rest term in (6.48) requires a different treatment because F ′′′(ζ) is not independent of vbd. It may however by estimated cheaply using a naive power counting.) By definition, Q is independent of vbd, and hence F ′(yR) and R are independent of hbd. Therefore the expectations of the first and second order terms (in the variable vbd) in EF ′(yR)(yS − yR) are put into B. The third order terms in EF ′(yR)(yS − yR) are bounded, using a naive power counting, by η2N 2p−3E|vbd|3 N −4/3N 2p−3N −1q−1 . (6.50) Here we used that, thanks to the assumption i = j, in the generic terms {i, j} ∩ {b, d} = ∅ there are at least three off-diagonal matrix elements R in the resolvent expansion of (6.46). Indeed, since b, d ∈/ {i, j}, the terms of order greater than one in (6.49) have at least two off-diagonal resolvents matrix elements, and other factor in (6.46) has at least one off-diagonal resolvent matrix element since i = j. Thus we get a factor p−3 by (6.36) (the non-generic terms are suppressed by a factor N −1). Note that the bound (6.50) is still too large compared to N 2, since p N −1/3. The key observation to solve this problem is that the expectation of the leading term is much smaller than its typical size; this allows us to gain an additional factor p−1. A similar observation was used in [19], but in the present case this estimate (performed in Lemma 6.8 below) is substantially complicated by the non-vanishing expectation of the entries of A. Much of the heavy notation in the following argument arises from the need to keep track of the non-generic terms, which have fewer off-diagonal elements than the generic terms, but have a smaller entropy factor. The improved bound on the difference EF ′(yR)(yS − yR) is N −4/3N 2p−4N −1q−1 = N −1/3p−4q−1 , which is much smaller than N −2 provided that q N φ for φ > 1/3 and ε is small enough. The key step to the proof of Proposition 6.6 is the following lemma. Lemma 6.7. Fix an index γ = φ(b, d) and recall the definitions of Q, R and S from (6.43). For any small enough ε > 0 and under the assumptions in Proposition 6.6, there exists a constant C depending on F (but independent of γ) and constants BN and DN , depending on the law law(Q) of the Green function Q and on the second moments m2(vbd) of vbd, such that, for large enough N (independent of γ) we have  E2 E2 EF η dy SijSji(y + iη) − EF η dy Rij Rji(y + iη) − BN m2(vbd), law(Q) E1 i=j E1 i=j N 1(b=d)−5/3+Cεq−1 , (6.51) 34 where, we recall, η = N −2/3−ε, as well as    E F η2 Sij Sji(z) − E F η2 Rij Rji(z) − DN m2(vbd), law(Q) N 1(b=d)−5/3+Cεq−1 , i=j i=j (6.52) where z = E +iη. The constants BN and DN may also depend on F , but they depend on the centered random variable vbd only through its second moments. Assuming Lemma 6.7, we now complete the proof of Proposition 6.6. Proof of Proposition 6.6. Clearly, Lemma 6.7 also holds if S is replaced by T . Since Q is independent of vbd and wbd, and m2(vbd) = m2(wbd) = 1/N , we have DN m2(vbd), law(Q) = DN m2(wbd), law(Q) . Thus we get from Lemma 6.7 that    E F η2 Sij Sji(z) − E F η2 Tij T ji(z) i=j i=j CN 1(b=d)−5/3+Cεq−1. (6.53) Recalling the definitions of S and T from (6.43), the bound (6.53) compares the expectation of a function of the resolvent of Aγ and that of Aγ−1. The telescopic summation in (6.39) then implies (6.38), since the number of summands with b = d is of order N 2 but the number of summands with b = d is only N . Similarly, (6.51) implies (6.30). This completes the proof. Proof of Lemma 6.7. Throughout the proof we abbreviate Aγ−1 = A = (aij ) where aij = hij + f /N . We shall only prove the more complicated case (6.51); the proof of (6.52) is similar. In fact, we shall prove the bound  EF η  E2 dy SijSji(y + iη) − EF E1 i=j E2 η dy Rij Rji(y + iη) E1 i=j − BN m2(hbd), law(Q) CN 1(b=d)−1/3+Cεp−4q−1 , (6.54) from which (6.51) follows by (6.37). From (6.36) we get max max Skl(E + iη) − δklmsc(E + iη) 1 k,l N E∈Iε p−1 (6.55) with (ξ, ν)-high probability. Define Ω as the event on which (6.55), (6.44), and (6.41) hold. We have proved that Ω holds with (ξ, ν)-high probability. Since the arguments of F in (6.54) are bounded by CN 2+2ε and F (x) increases at most polynomially, it is easy to see that the contribution of the event Ωc to the expectations in (6.54) is negligible. Define xS and xR by xS ..= η E2 dy Sij Sji(y + iη) , E1 i=j xR ..= η E2 dy Rij Rji(y + iη), E1 i=j (6.56) 35 and decompose xS into three parts xS = xS2 + xS1 + xS0 where xSk ..= η E2 dy 1 |{i, j} ∩ {b, d}| = k SijSji(y + iη) ; E1 i=j (6.57) xRk is defined similarly. Here k = |{i, j} ∩ {b, d}| is the number of times the indices b and d appear among the summation indices i, j. Clearly k = 0, 1 or 2. The number of the terms in the sum of the definition of xSk is O(N 2−k). A resolvent expansion yields S = R − RV R + (RV )2R − (RV )3R + (RV )4R − (RV )5R + (RV )6S . (6.58) In the following formulas we shall, as usual, omit the spectral parameter from the notation of the resolvents. The spectral parameter is always y + iη with y ∈ [E1, E2]; in particular, y ∈ Iε. If |{i, j} ∩ {b, d}| = k, recalling that i = j we find that there are at least 2 − k off-diagonal resolvent elements in (RV )mR ij , so that (6.36) yields in Ω (RV )mR ij Cm N εq−1 m p−(2−k) where m ∈ N+ , m 6 , k = 0, 1, 2 . (6.59) Similarly, we have in Ω (RV )mS ij Cm N εq−1 m p−(2−k) where m ∈ N+ , m 6 , k = 0, 1, 2 . (6.60) Therefore we have in Ω that |xSk − xRk | CN 2/3−kp−(3−k)N εq−1 for k = 0, 1, 2 , (6.61) where the factor N 2/3−k comes from i=j, η and dE. Inserting these bounds into the Taylor expansion of F , using q N φ N 1/3+Cε p N 1/3−2ε (6.62) and keeping only the terms larger than O(N −1/3+Cεp−4q−1), we obtain E F (xS ) − F (xR) − E F ′(xR)(xS0 − xR0 ) + 1 2 F ′′ (xR )(xS0 − xR0 )2 + F ′(xR)(xS1 − xR1 ) CN −1/3+Cεp−4q−1 , (6.63) where we used the remark after (6.55) to treat the contribution on the event Ω. Since there is no x2 appearing in (6.63), we can focus on the cases k = 0 and k = 1. To streamline the notation, we introduce Ri(jm) ..= (−1)m (RV )mR ij . (6.64) Then using (6.59) and the estimate maxi=j |Rij| p−1 we get Ri(jm) Cm N εq−1 m p−(2−k)+δ0mδ0k . (6.65) 36 Now we decompose the sum xSk − xRk according to the number of matrix elements hbd and hdb. To that end, for k ∈ {0, 1} and s, t ∈ {0, 1, 2, 3, 4} and s + t 1, we define E2 Q(ks,t) ..= η dy 1 |{i, j} ∩ {b, d}| = k Ri(js)Rj(ti) , E1 i=j and set Q(kℓ) ..= Q(ks,t) . s+t=ℓ Using (6.65) we get the estimates, valid on Ω, (6.66) (6.67) Q(ks,t) Cst N εq−1 s+t N p2/3−k −(4−2k)+δ0sδ0k+δ0tδ0k , Q(kℓ) where ℓ 1. Using (6.62), (6.59), and (6.60), we find the decomposition Cℓ N εq−1 ℓ N 2/3−kp−(3−k) , (6.68) xSk − xRk = Q(ks,t) + O(N −1/3+Cεp−4q−1) , 1 s+t 4 (6.69) where s and t are non-negative. By (6.68) and (6.44) we have for s + t 1 Ebd Q(ks,t) Cstq2−s−tN −1/3−kp−(4−2k)+δ0sδ0k+δ0tδ0k , (6.70) where Ebd denotes partial expectation with respect to the variable hbd. Here we used that only terms with at least two elements hbd or hdb survive. Recalling (6.42), we find that taking the partial expectation Ebd improves the bound (6.68) by a factor q2/N . Thus we also have Ebd Q(kℓ) Cℓ q2−ℓN −1/3−kp−(3−k) (6.71) Similarly, for s + t 1 and u + v 1 we have Ebd Q(ks,t)Q(ku,v) q N p ,2−s−t−u−v 1/3−2k −(8−4k)+δ0sδ0k +δ0tδ0k +δ0uδ0k +δ0v δ0k (6.72) which implies Ebd Q(kℓ1)Q(kℓ2) q2−ℓ1−ℓ2 N (1/3−2k)+Cεp−6+2k . (6.73) Inserting (6.71) and (6.73) into the second term of the left-hand side of (6.63), and using the assumption F as well as (6.62), we find E F ′(xR)(xS0 − xR0 ) + F ′(xR)(xS1 − xR1 ) + 1 2 F ′′ (xR )(xS0 − xR0 )2 = B + EF ′(xR)Q(03) + O N −1/3+Cεp−4q−1 = B + EF ′(xR) Q(00,3) + Q(03,0) + O N −1/3+Cεp−4q−1 , (6.74) 37 where we defined  B ..= E  F ′(xR) Q(k1) + Q(k2) k=0,1  + 1 2 F ′′(xR ) Q(01) 2  = E F ′(xR)Ebd Q(k1) + Q(k2) + 1 2 F ′′ (xR )Ebd Q(10) 2 . k=0,1 (6.75) Note that B depends on hbd only through its expectation (which is zero) and on its second moment. Thus, B will be BN (m2(vbd), law(Q)) from (6.51). In order to estimate (6.74), it only remains to estimate EF ′(xR)Q(00,3) and EF ′(xR)Q(03,0). Using (6.70), (6.63), (6.74), and (6.71), we have E F (xS) − F (xR) − B N −1/3+Cεp−3q−1 , (6.76) which implies (6.54) in the case b = d. Let us therefore from now on assume b = d. Since we estimate Q(03,0) and Q(00,3), this implies that i, j, b, d are all distinct. In order to enforce this condition in sums, it is convenient to introduce the indicator function χ ≡ χ(i, j, b, d) ..= 1 |{i, j, b, d}| = 4 . Recalling (6.64), we introduce the notation Ri(jm,s) to denote the sum of the terms in the definition (6.64) of Ri(jm) in which the number of the off-diagonal elements of R is s. For example, Ri(j3,0) = Ri(j3,1) = 0 , Ri(j3,2) = RibhbdRddhdbRbbhbdRdj + RidhdbRbbhbdRddhdbRbj . (6.77) Then in the case χ = 1 we have Now from the definition (6.66) we get 4 Ri(j3) = Ri(j3,s) s=2 (6.78) 4 Q(00,3) = Q0(0,3,s) s=2 where E2 Q0(0,3,s) ..= η dy χ Ri(j0)Rj(3i,s) , E1 i,j (6.79) and 4 Q(03,0) = Q0(3,0,s) s=2 As above, it is easy to see that for s where E2 Q0(3,0,s) ..= η dy χ Ri(j3,s)Rj(0i) . E1 i,j 3 we have (6.80) which implies, using (6.74), EF ′(xR)Q0(0,3,s) N −1/3+Cεp−4q−1 , (6.81) E F (xS ) − F (xR) − B EF ′(xR)Q(00,3,2) + EF ′(xR)Q(03,0,2) + N −5/3+Cεq−1 . (6.82) 38 By symmetry, it only remains to prove that EF ′(xR)Q(03,0,2) N −1/3+Cεp−4q−1 . (6.83) Using the definition (6.79) and the estimate (6.44) to replace some diagonal resolvent matrix elements with msc, we find E2 EF ′(xR)Q(03,0,2) = η dy EF ′(xR) χ RibhbdRddhdbRbbhbdRdj Rji + (b ↔ d) E1 i,j E2 =η dy EF ′(xR) χ m2scRibRdj Rji Ebd|hbd|2hbd (b ↔ d) E1 i,j + O(N −1/3+Cεp−4q−1) , (6.84) where we used the estimate Ebd|hbd|2hbd C Nq to control the errors for the replacement. Combining (6.84) with (6.74) and (6.63), we therefore obtain E[F (xS ) − F (xR)] − B CN −1/3+Cεp−4q−1 + Cq−1N −1/3+Cε × max max χ EF ′(xR)Rij RjbRdi + EF ′(xR)RibRdjRji + (b ↔ d) , (6.85) y∈Iε i,j where we used the trivial bounds on F ′ and |msc|, and that every estimate is uniform in y. In order to complete the proof of Lemma 6.7, we need to estimate the expectations in (6.85) by a better bound than the naive high-probability bound on the argument of E. This is accomplished in Lemma 6.8 below. From Lemma 6.8 and (6.85) we get in the case b = d that E[F (xS ) − F (xR)] − B N −1/3+Cεp−4q−1 , (6.86) where B was defined in (6.75). This completes the proof of Lemma 6.7. Lemma 6.8. Under the assumptions of Lemma 6.7, in particular fixing f = f∗, and assuming that a, b, i, j are all distinct, we have max y∈Iε EF ′(xR)RibRdj Rji(y + iη) Cp−4 . (6.87) The same estimate holds for the other three terms on the right-hand side of (6.85). In order to prove Lemma 6.8, we shall need the following result, which is crucial when estimating terms arising from the nonvanishing expectation of Eaij = f N −1. Before stating it, we introduce some notation. Recall that we set A ≡ Aγ−1 = (aij ), where the matrix entries are given by aij = hij + f /N and Ehij = (A(b) )ij 0. We denote by A(b) the ..= 1(i = b)1(j = b)aij. matrix obtained If Z ≡ Z(A) is from A by a function setting all entries with index b to of A, we define Z(b) ..= Z(A(b)). zero, i.e. See also Definitions 5.2 and 3.3 in [11]. We also use the notation Eb to denote partial expectation with respect to all variables (a1b, . . . , aNb) in the b-th column of A. Lemma 6.9. For any fixed i we have, with (ξ, ν)-high probability, 1 N Si(lk)hlk k=i l=k p−2 . 39 Proof. The claim is an immediate consequence of Proposition 7.11 in [11] and the observation that, for E ∈ Iε, η = N −2/3−ε, and q N φ we have (log N )C 1 q + Im msc Nη + 1 Nη p−1 for large enough N . Another ingredient necessary for the proof of Lemma 6.8 is the following resolvent identity. Lemma 6.10. Let A = (aij ) be a square matrix and set S = (Sij ) = (A − z)−1. Then for i = j we have Sij = −Sii aikSk(ij) , k=i Sij = −Sjj Si(kj)akj . k=j (6.88) Proof. We prove the first identity in (6.88); the second one is proved analogously. We use the resolvent identity Sij = Si(jk) + Sik Skj Skk for i, j = k (6.89) from [11], (3.8). Without loss of generality we assume that z = 0. Then (6.89) and the identity AS = ½ yield aik Sk(ij) k=i = k=i aik Skj − k=i aik SkiSij Sii = −aiiSij − Sij Sii (1 − aiiSii) = Sij Sii . Armed with Lemmas 6.9 and 6.10, we may now prove Lemma 6.8. Proof of Lemma 6.8. With the relation between R and S in (6.45) and (6.59), we find that (6.87) is implied by EF ′(xS )SibSdjSji Cp−4N Cε, (6.90) under the assumption that b, d, i, j are all distinct. This replacement is only a technical convenience when we apply a large deviation estimate below. Recalling the definition of Ω after (6.55), we get using (6.89) |Sij − Si(jb)| = SibSbj (Sbb)−1 Cp−2 in Ω . This yields |xS − (xS)(b)| p−1N Cε in Ω . Similarly, we have SibSdj Sji − SibSd(jb)Sj(ib) Cp−4 in Ω . Hence by assumption on F we have (6.91) (6.92) (6.93) |EF ′(xS )SibSdj Sji| E F ′ (xS )(b) SibSd(jb)Sj(ib) + O p−4N Cε . (6.94) Since (xS )(b) and Sd(jb)Sj(ib) are independent of the b-th row of A, we find from (6.94) that (6.90), and hence (6.87), is proved if we can show that EbSib = O(p−2) (6.95) 40 for any fixed i and b. What remains therefore is to prove (6.95). Using (6.55) and (6.91) we find in Ω that Sbb = msc + O(p−1) , Using akb = hkb + f /N we write Si(kb) = O(p−1) . (6.96) Sib = −msc Si(kb) hkb + f N k=b − (Sbb − msc) Si(kb) hkb + f N k=b . (6.97) By (6.96) and the large deviation estimate (3.15) in [11], the second sum in (6.97) is bounded by O(p−1) with (ξ, ν)-high probability. Therefore, using (6.96) and Ebhkb = 0, we get Eb Sib = −mscf N Si(kb) + O(p−2) = −mscf N Sik + O(p−2) , k=b k=i (6.98) where in the second step we used (6.89). In order to estimate the right-hand side of (6.98), we introduce the quantity X ..= 1 N EkSik . k=i Note that X depends on the index i, which is omitted from the notation as it is fixed. Using (6.88), (6.96), and (6.89) as above, we find with (ξ, ν)-high probability X = −msc N Ek Si(lk) hlk + f N k=i l=k + O(p−2) = −mscf N2 Si(lk) + O(p−2) k=i l=k = −mscf N Sil + O(p−2) l=i = −mscf X + O 1 N Sil − ElSil l=i + O(p−2) . Now recall that the spectral parameter z = E + iη satisfies E ∈ Iε (see (6.27)) and η = N −2/3−ε. Therefore (3.6) implies that msc(z) = −1+o(1). Recalling that f = 1+ε0, we therefore get, with (ξ, ν)-high probability, X =O 1 N Sil − ElSil l=i + O(p−2) . (6.99) We now return to (6.98), and estimate, with (ξ, ν)-high probability Eb Sib = −mscf N Sik + O(q−2) = −mscf X + O 1 N Sik − EkSik k=i k=i + O(p−2) . 41 Together with (6.99) this yields, with (ξ, ν)-high probability, EbSib = O 1 N Sik − EkSik k=i + O(p−2) . (6.100) In order to estimate the quantity in parentheses, we abbreviate IEkZ ..= Z − EkZ for any random variable Z and write, using (6.88), 1 N Sik − EkSik k=i = −1 N IEk SkkSi(lk)alk k=i l=k = −msc N IEk Si(lk) hlk + f N k=i l=k − 1 N IEk(Skk − msc) Si(lk) hlk + f N k=i l=k . Using the large deviation estimate (3.15) in [11], (6.96), and the bound |hlk| p−1 which holds with (ξ, ν)high probability (see Lemma 3.7 in [11]), we find that the second term is bounded by O(p−2) with (ξ, ν)-high probability. Thus we get 1 N Sik − EkSik k=i = −msc N Si(lk)hlk + O(p−2) k=i l=k with (ξ, ν)-high probability. Therefore (6.100) and Lemma 6.9 imply (6.95), and the proof is complete. 7. Universality of generalized Wigner matrices with finite moments This section is an application of our results to the problem of universality of generalized Wigner matrices (see Definition 7.1 below) whose entries have heavy tails. We prove the bulk universality of generalized Wigner matrices under the assumption that the matrix entries have a finite m-th moment for some m > 4. We also prove the edge universality of Wigner matrices under the assumption that m > 12. (This lower bound can in fact be improved to m 7; see Remark 7.5 below.) The Tracy-Widom law for the largest eigenvalue of Wigner matrices was first proved in [33] under a Gaussian decay assumption, and was proved later in [29, 35, 19, 24] under various weaker restrictions on the distributions of the matrix elements. In particular, in [24] the Tracy-Widom law was proved for entries with symmetric distribution and m > 12. In [23] similar results were derived for complex Hermitian Gaussian divisible matrices, where the GUE component is of order one. For this case it is proved in [23] that bulk universality holds provided the entries of the Wigner component have finite second moments, and edge universality holds provided they have finite fourth moments. Definition 7.1. We call a Hermitian or real symmetric random matrix H = (hij) a generalized Wigner matrix if the two following conditions hold. First, the family of upper-triangluar entries (hij : i j) is independent. Second, we have Ehij = 0 , E|hij |2 = σi2j , where the variances σi2j satisfy σi2j = 1 , j C− inf i,j (N σi2j ) sup(N σi2j ) i,j C+ , 42 and 0 < C− C+ < ∞ are constants independent of N . Theorem 7.2 (Bulk universality). Suppose that H = (hij) satisfies Definition 7.1. Let m > 4 and assume that for all i and j we have E hij /σij m Cm , (7.1) for some constant Cm, independent of i, j, and N . Let n ∈ N and O : Rn → R be compactly supported and continuous. Let E satisfy −2 < E < 2 and let ε > 0. Then for any sequence bN satisfying N −1+ε bN ||E| − 2|/2 we have lim N →∞ E+bN dE′ E−bN 2bN dα1 · · · dαn O(α1, . . . , αn) × 1 ̺sc(E)n p(Nn) − p(Gn,)N E′ + N α1 ̺sc(E ) , . . . , E′ + αn N ̺sc(E) = 0. Here ̺sc was defined in (2.10), p(Nn) is the n-point marginal of the eigenvalue distribution of H, and p(Gn,)N the n-point marginal of an N × N GUE/GOE matrix. Theorem 7.3 (Edge universality). Suppose that Hv = (hvij ) and Hw = (hwij ) both satisfy Definition 7.1. Assume that the entries of Hv and Hw all satisfy (7.1) for some m > 12, and that the two first two moments of the entries of hvij and hwij match: Ev(hvij )l(hvij )u = Ew(hwij )l(hwij )u for 0 l + u 2 . Then there is a δ > 0 such that for any s ∈ R we have Pv N 2/3(λN − 2) s − N −δ − N −δ Pw N 2/3(λN − 2) s Pv N 2/3(λN − 2) s + N −δ + N −δ . (7.2) Here Pv and Pw denote the laws of the ensembles Hv and Hw respectively, and λN denotes the largest eigenvalue of Hv or Hw. Remark 7.4. A similar result holds for the smallest eigenvalue λ1. Moreover, a result analogous to (7.2) holds for the n-point joint distribution functions of the extreme eigenvalues. (See [19], Equation (2.40)). Remark 7.5. With some additional effort, one may in fact improve the condition m > 12 in Theorem 7.3 to m 7. The basic idea is to match seven instead of four moments in Lemma 7.7, and to use the resolvent expansion method from Section 6.3. We omit further details. The rest of this section is devoted to the proof of Theorems 7.2 and 7.3. 7.1. Truncation. For definiteness, we focus on real symmetric matrices, but the following truncation argu- ment applies trivially to complex Hermitian matrices by truncating the real and imaginary parts separately. To simplify the presentation, we consider Wigner matrices for which σij = N −1/2. The proof for the more general matrices from Definition 7.1 is the same; see also Remark 2.4 in [11]. We begin by noting that, without loss of generality, we may assume that the distributions of the entries of H are absolutely continuous. Otherwise consider the matrix H + εN V , where V is a GUE/GOE matrix independent of H, and (εN ) is a positive sequence that tends to zero arbitrarily fast. (Note that the following argument is insensitive to the size of εN .) 43 Let assume H≡ that Hx the =fam(hixilyj )(bxeija.. Wigner matrix whose i j) is independent, entries are of the form hxij and that each xij satisfies = N −1/2xij for some xij . We Exij = 0 , E|xij |2 = 1 . Moreover, we assume that there is an m > 4 and a constant Cm 1, independent of i, j, and N , such that E|xij |m Cm . In a first step we construct a truncated Wigner matrix Hy from Hx. This truncation is performed in the following lemma. Lemma 7.6. Fix m > 2 and let X be a real random variable, with absolutely continuous law, satisfying EX = 0 , EX2 = 1 , E|X|m Cm . Let λ > 0. Then there exists a real random variable Y that satisfies EY = 0 , EY 2 = 1 , |Y | λ , P(X = Y ) 2Cmλ−m . Proof. We introduce the abbreviations P ..= P(|X| > λ) , E ..= E X 1(|X| > λ) , V ..= E X2 1(|X| > λ) . Using the assumption on X, Markov’s inequality, and Ho¨lder’s inequality, we find P Cmλ−m , |E| Cmλ−m+1 , V Cmλ−m+2 . (7.3) The idea behind the construction of Y is to cut out the tail |X| > λ, to add appropriate Dirac weights at ±λ, and to adjust the total probability by cutting out the values X ∈ [−a, a], where a is an appropriately chosen small nonnegative number. For any t satisfying 0 t 1/2, we choose a nonnegative number at such that P(X ∈ [−at, at]) = t. Note that since X is absolutely continuous such a number at exists and the map t → at is continuous. Moreover, using EX2 = 1 and Markov’s inequality we find that at 2 for t 1/2. We define the quantities et ..= E X1(−at X at) , vt ..= E X21(−at X at) , which satisfy the trivial bounds |et| 2t , vt 4t . (7.4) We shall remove the values (−∞, −λ) ∪ [−at, at] ∪ (λ, ∞) from the range of X, and replace them with Dirac weights at λ and −λ with respective probabilities p and q. Thus we are led to the system p + q = P + t , p − q = λ−1(E + et) , p + q = λ−2(V + vt) . (7.5) In order to solve (7.5), we abbreviate the right-hand sides of the equations in (7.5) by α(t), β(t), and γ(t) respectively. In a first step, we solve t from the equation α(t) = γ(t). To that end, we observe that α(0) γ(0), as follows from the trivial inequality V λ2P . Moreover, α(1/2) ≫ γ(1/2), by (7.3) and (7.4). Since α(t) 44 and t0 γ(t) are Cmλ−m continuous, the + 4λ−2t0, from equation which we α(t) = γ(t) get that t0 has a solution t0. Moreover, 2Cmλ−m. For the following (7.3) and (7.4) we fix t ..= t0. imply that In a second step, we solve the equations p + q = α(t) and p − q = β(t) to get p = α(t) + 2 β(t) , q = α(t) − 2 β(t) . We now claim that |β(t)| α(t). Indeed, a simple application of Cauchy-Schwarz yields |β(t)| γ(t))/2 = α(t). Hence p and q are nonnegative. Moreover, the bounds (7.3) and (7.4) yield (α(t) + p + q 2Cmλ−m . Thus we have proved that (7.5) has a solution (p, q, t) satisfying 0 p, q, t 2Cmλ−m . Next, let I ..= (−∞, −λ) ∪ [−at, at] ∪ (λ, ∞). Thus, P(X ∈ I) = p + q. Partition I = I1 ∪ I2 such that P(X ∈ I1) = p and P(X ∈ I2) = q. Then we define Y ..= X1(X ∈/ I) + λ1(X ∈ I1) − λ1(X ∈ I2) . Recalling (7.5), we find that Y satisfies EY = 0 and EY 2 = 1. Moreover, P(X = Y ) = P(X ∈ I) = p + q 2Cmλ−m . This concludes the proof. Note that the satisfying 0 < ρ < a random variable variable 1/2 and yij such Y constructed assume that m in > Lemma 7.6 satisfies 4. Using Lemma 7.6 E|Y |m with λ ..=3NCρm, that the family (yij .. i j) is independent and . Let ρ be some we construct, for exponent each xij , Eyij = 0 , Eyi2j = 1 , |yij | N ρ , P(xij = yij ) 2CmN −ρm , E|yij |m 3Cm . (7.6) We define the new matrix Hy = (hyij) through hyij ..= N −1/2yij. In particular, we have Ehyij = 0 , E|hyij |2 = 1 N , E|hyij |p 1 N qp−2 , (7.7) where we set Thus, Hy satisfies Definition 2.1. q ..= N 1/2−ρ . (7.8) 7.2. Moment matching. Next, we construct a third Wigner matrix, Hz = (hzij), whose entries are of the form hzij = N −1/2zij. We require that zij have uniformly subexponential decay, i.e. Ezij = 0 , E|zij |2 = 1 , P(|zij | ξ) θ−1e−ξθ , (7.9) for some θ > 0 independent of i, j, and N . We choose zij so as to match the first four moments of yij. 45 Lemma 7.7. Let yij satisfy (7.6) with some m > 4. Then there exists a zij satisfying (7.9) such that Ezilj = Eyilj for l = 1, . . . , 4. Proof. In fact, using an explicit construction similar to the one used in the proof of Theorem 2.5, zij can be chosen to be supported at only three points. We omit further details. It was proved in [19], Section 2.1, that the statement of Theorem 7.2 holds if the entries of H satisfy the subexponential decay condition (7.9). Theorem 7.2 will therefore follow if we can prove that, for bN and O as in Theorem 7.2, we have E+bN dE′ lim N →∞ E−bN 2bN dα1 · · · dαn O(α1, . . . , αn) × 1 ̺sc(E)n p(xn,N) − p(zn,N) E′ + N α1 ̺sc(E ) , . . ., E′ + αn N ̺sc(E) = 0 , (7.10) where px(n,N) and pz(n,N) are the n-point marginals of the eigenvalue distributions of Hx and Hz, respectively. Similarly, it was proved in [19], Section 2.2, that the statement of Theorem 7.3 holds if the entries of Hv and Hw both satisfy the subexponential decay condition (7.9). Thus, Theorem 7.3 will follow if we can prove that Px N 2/3(λN − 2) s − N −δ − N −δ Pz N 2/3(λN − 2) s Px N 2/3(λN − 2) s + N −δ + N −δ , (7.11) for some δ > 0. Here we use Px and Pz to denote the laws of ensembles Hx and Hz respectively. We shall prove both (7.10) and (7.11) by first comparing Hx to Hy and then comparing Hy to Hz. The first step is easy: from (7.6) we get P(Hx = Hy) 2CmN 2−ρm . (7.12) Thus, (7.10) and (7.11) hold with z replaced by y provided that ρm > 2 . (7.13) 7.3. Comparison of Hy and Hz in the bulk. In this section we prove that (7.10) holds with x replaced by y, and hence complete the proof of Theorem 7.3. We compare the local spectral statistics of Hy and Hz using the Green function comparison method from [17], Section 8. The key additional input is the local semicircle theorem for sparse matrices, Theorem 3.3. We merely sketch the differences to [17]. As explained in [17], the n-point correlation functions p(Nn) can be expressed (up to an error N −c) in terms of expectations of observables F , whose arguments are products of expressions of the form m(zi + iη) where η ..= N −1−ε. We assume that the first five derivatives of F are polynomially bounded, uniformly in N . Using the local semicircle law for sparse matrices, Theorem 3.3, we may control the Green function matrix elements down to scales N −1−ε, uniformly in E. (Note that in [17], the spectral edge had to be excluded since the bounds derived there were unstable near the edge, unlike our bound (3.14).) This allows us to compare the local eigenvalue statistics of the matrix ensembles at scales N −1−ε, which is sufficiently accurate for both Theorems 7.2 and 7.3. We use the telescopic summation and the Lindeberg replacement argument from [17], Chapter 8, whose notations we take over without further comment; see also Section 6.3. A resolvent expansion yields S = R − N −1/2RV R + N −1(RV )2R − N −3/2(RV )3R + N −2(RV )4R − N −5/2(RV )5S . 46 Next, note that, by (7.7) and (7.9), both ensembles Hy and Hz satisfy Definition 2.1 with q defined in (7.8). Hence we may invoke Theorem 3.3 with f = 0, and in particular (3.14), for the matrices Hy and Hz. Choosing η = N −1−ε for some ε > 0, we therefore get |Rij (E + iη)| N 2ε , |Sij (E + iη)| N 2ε with (ξ, ν)-high probability. Consider now the difference Ey − Ez applied to some fixed observable F depending on traces (normalized by N −1) of resolvents and whose derivatives have at most polynomial growth. Since the first four moments of the entries of Hy and Hz coincide by Lemma 7.7, the error in one step of the telescopic summation is bounded by the expectation of the rest term in the resolvent expansion, i.e. N CεN −5/2E(RV )5S N −5/2+C ε max a,b E|Vab |5 N −5/2+CεCmN ρ , where in the last step we used (7.6). The first factor N Cε comes from the polynomially bounded derivatives of F . Summing up all O(N 2) terms of the telescopic sum, we find that the difference Ey − Ez applied to F is bounded by N −1/2+Cε+ρ . (7.14) Combining (7.14) and (7.12), we find that both (7.10) follows provided that − 1 2 + Cε + ρ < 0, ρm > 2 . (7.15) Since m > 4 is fixed, choosing first 1/2 − ρ small enough and then ε small enough yields (7.15). This completes the proof of Theorem 7.2. 7.4. Comparison of Hy and Hz at the edge. In order to prove (7.2) under the assumption m > 12, we may invoke Proposition 6.4, which implies that (7.11) holds with x replaced by y, provided that φ = 1/2−ρ > 1/3, i.e. ρ < 1/6. Together with the condition (7.13), this implies that (7.2) holds if m > 12. A. Regularization of the Dyson Brownian motion In this appendix we sketch a simple regularization argument needed to prove two results concerning the Dyson Brownian motion (DBM). This argument can be used as a substitute for earlier, more involved, proofs given in Appendices A and B of [16] on the existence of the dynamics restricted to the subdomain ΣN ..= {x .. x1 < x2 < · · · < xN }, and on the applicability of the Bakry-Emery method. The argument presented in this section is also more probabilistic in nature than the earlier proofs of [16]. For applications in Section 4 of this paper, some minor adjustments to the argument below are needed to incorporate the separate treatment of the largest eigenvalue. These modifications are straightforward, and we shall only sketch the argument for the standard DBM. Theorem A.1. Fix n 1 and let m = (m1, . . . , mn) ∈ Nn be an increasing family of indices. Let G .. Rn → R be a continuous function of compact support and set Gi,m(x) ..= G N (xi − xi+m1 ), N (xi+m1 − xi+m2 ), . . . , N (xi+mn−1 − xi+mn ) . 47 Let γ1, . . . , γN denote the classical locations of the eigenvalues and set N Q ..= sup t∈[t0,τ ] i=1 (xi − γi)2ft dµ (A.1) Choose an ε > 0. Then for any ρ satisfying 0 < ρ < 1, and setting τ = N −ρ, there exists a τ¯ ∈ [τ /2, τ ] such that, for any J ⊂ {1, 2, . . . , N − mn − 1}, we have 1 |J | Gi,m fτ¯ dµ − i∈J 1 |J | Gi,m dµ i∈J CNε NQ + 1 |J |τ (A.2) for all N N0(ρ). Here µ = µ(N) is the equilibrium measure of the N eigenvalues of the GOE. Define µβ(x) = Ce−NβH(x) as in (4.6) and (4.5), but introducing a parameter β so that µβ is the equilibrium measure of the usual β-ensemble which is invariant under the (β-dependent) DBM. We remark that Theorem A.1 holds for all β 1 with an identical proof. The following lemma holds more generally for β > 0. Let ω ..= Cµβe−N ,j Uj(xj) where Uj is a C2-function satisfying min j Uj′′ (x) τ −1 (A.3) for some τ < 1. For the following lemma we recall the definition (4.7) of the Dirichlet form. Lemma A.2. Let β > 0 and q ∈ H1(dω) be a probability density with respect to ω. Then for any β > 0 and any J ⊂ {1, 2, . . . , N − mn − 1} and any t > 0 we have 1 |J | Gi,m q dω − i∈J 1 |J | Gi,m dω i∈J C Dω (√q) |J | t + C Sω(q) e−ct/τ . (A.4) Recall that the DBM is defined via the stochastic differential equation dxi = √dBi + β N − 1 4 xi + 1 2N j=i 1 xi − xj dt for i = 1, . . . , N , (A.5) where B1, . . . , BN is a family of independent standard Brownian motions. It was proved in [1], Lemma 4.3.3, that there is a unique strong solution to (A.5) for all β 1. For any δ > 0 define the extension µδβ of the measure µβ from ΣN to RN by replacing the singular logarithm with a C2-function. To that end, we introduce the approximation parameter δ > 0 and define, as in Section 4, for x ∈ RN , Hδ(x) ..= 1 4 x2i − 1 N logδ(xj − xi) i i 0 −∞ if x 0 . Furthermore, we have the lower bound ∂x2 logδ(x) − 1 x2 if x > δ − 1 δ2 if x δ. Similarly, we can extend the measure ω to RN by setting ωδ = Ce−N j Uj(xj)µδβ. Lemma A.3. Let q ∈ L∞(dωδ) be a C2 probability density. Then for δ {1, 2, . . . , N − mn − 1} and any t > 0 we have 1/N , β > 0 and any J ⊂ 1 |J | Gi,m q dωδ − i∈J 1 |J | Gi,m dωδ i∈J C Dωδ (√q) t |J | + C Sωδ (q) e−ct/τ . (A.6) Proof. The proof of Theorem 4.3 in [16] applies with merely cosmetic changes; now however the dynamics is defined on RN instead of ΣN , so that complications arising from the boundary are absent. The condition δ 1/N is needed since we use the singularity of ∂x2 log x to generate a factor 1/N 2 in the regime x C/N in the proof. Proof of Lemma A.2. Suppose that q is a probability density in ΣN with respect to ω. We extend q to be zero there is outside Σ a constant and Cε,δ let qε such ∈ C2 be any regularization of q that qεδ ..= Cε,δ qε is a probability on RN density that with converges respect to to q ωδ . in H1(ω). Thus (A.6) Then holds with q replaced by qεδ. Taking the limit δ → 0 and then ε → 0, we have 1 |J | Gi,m q dω − i∈J 1 |J | Gi,m dω i∈J C lim lim ε→0 δ→0 Dωδ ( qεδ) |J | t + C lim ε→0 lim δ→0 Sωδ (qεδ) e−ct/τ . (A.7) Notice that ωδ → 1(ΣN )ω weakly as δ → 0. Thus lim ε→0 lim δ→0 Dωδ ( qεδ ) = lim ε→0 Dω (√qε ) = Dω (√q) provided that q ∈ H1(ω). This proves Lemma A.2. Notice that the proof did not use the existence of DBM; instead, it used the existence of the regularized DBM. Proof of Theorem A.1. Write 1 |J | Gi,m ft dµ = Ef0µEx0 1 |J | Gi,m(x(t)) . i∈J i∈J 49 Here Ex0 denotes expectation with respect to the law of the DBM (x(t))t starting from x0, and Ef0µ denotes expectation of x0 with respect to the measure f0µ. Let Eδ denote expectation with respect to the regularized DBM. Then we have Ex0 1 |J | Gi,m(x(t)) = lim δ→0 Exδ 0 1 |J | Gi,m(x(t)), i∈J i∈J where we have used the existence of a strong solution to the DBM (see [1], Lemma 4.3.3) and that the dynamics remains in ΣN almost surely. Hence 1 |J | Gi,m ft dµ = lim δ→0 i∈J 1 |J | Gi,m ftδ dµδ, i∈J where ftδ is the solution to the regularized DBM at the time t with initial data f0µ/µδ. Using that (A.2) holds for the regularized dynamics, and taking the limit δ → 0, we complete the proof. References [1] Anderson, G., Guionnet, A., Zeitouni, O.: An Introduction to Random Matrices. Studies in advanced mathematics, 118, Cambridge University Press, 2009. [2] Aptekarev, A., Khabibullin, R.: Asymptotic expansions for polynomials orthogonal with respect to a complex non-constant weight function, Trans. Moscow Math. Soc. 68, 1–37 (2007). [3] Auffinger, A., Ben Arous, G., P´ech´e, S.: Poisson Convergence for the largest eigenvalues of heavytailed matrices. Ann. Inst. Henri Poincar´e Probab. Stat. 45, no. 3, 589–610 (2009). [4] Biroli, G., Bouchaud, J.-P., Potters, M.: On the top eigenvalue of heavy-tailed random matrices. Europhysics Lett. 78, 10001 (2007). [5] Bleher, P., Its, A.: Semiclassical asymptotics of orthogonal polynomials, Riemann-Hilbert problem, and universality in the matrix model. Ann. of Math. 150, 185–266 (1999). [6] Brascamp, H. J. and Lieb, E. H.: On extensions of the Brunn-Minkowski and Pr´ekopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. J. Funct. Anal. 22, 366–389 (1976). [7] Deift, P., Kriecherbauer, T., McLaughlin, K.T-R, Venakides, S., Zhou, X.: Uniform asymptotics for polynomials orthogonal with respect to varying exponential weights and applications to universality questions in random matrix theory. Comm. Pure Appl. Math. 52, 1335–1425 (1999). [8] Deift, P., Kriecherbauer, T., McLaughlin, K.T-R, Venakides, S., Zhou, X.: Strong asymptotics of orthogonal polynomials with respect to exponential weights. Comm. Pure Appl. Math. 52, 1491– 1552 (1999). [9] Dyson, F.J.: A Brownian-motion model for the eigenvalues of a random matrix. J. Math. Phys. 3, 1191–1198 (1962). 50 [10] Erdo˝s, L.: Universality of Wigner random matrices: a Survey of Recent Results (lecture notes). Preprint arXiv:1004.0861. [11] Erdo˝s, L., Knowles, A., Yau, H.-T., Yin, J.: Spectral statistics of sparse random matrices I: local semicircle law. Preprint arXiv:1103.1919, to appear in Ann. Prob. [12] Erdo˝s, L., Schlein, B., Yau, H.-T.: Semicircle law on short scales and delocalization of eigenvectors for Wigner random matrices. Ann. Probab. 37, No. 3, 815–852 (2009). [13] Erdo˝s, L., Schlein, B., Yau, H.-T.: Local semicircle law and complete delocalization for Wigner random matrices. Commun. Math. Phys. 287, 641–655 (2009). [14] Erdo˝s, L., Schlein, B., Yau, H.-T.: Wegner estimate and level repulsion for Wigner random matrices. Int. Math. Res. Notices. 2010, No. 3, 436-479 (2010). [15] Erdo˝s, L., Schlein, B., Yau, H.-T.: Universality of random matrices and local relaxation flow. Preprint arXiv:0907.5605. [16] Erdo˝s, L., Schlein, B., Yau, H.-T., Yin, J.: The local relaxation flow approach to universality of the local statistics for random matrices. To appear in Ann. Inst. H. Poincar´e Probab. Statist. Preprint arXiv:0911.3687. [17] Erdo˝s, L., Yau, H.-T., Yin, J.: Bulk universality for generalized Wigner matrices. Preprint arXiv:1001.3453. [18] Erdo˝s, L., Yau, H.-T., Yin, J.: Universality for generalized Wigner matrices with Bernoulli distribution. To appear in J. Combinatorics. Preprint arXiv:1003.3813. [19] Erdo˝s, L., Yau, H.-T., Yin, J.: Rigidity of eigenvalues of generalized Wigner matrices. Preprint arXiv:1007.4652. [20] Erdo˝s, P.; R´enyi, A.: On random graphs. I. Publicationes Mathematicae 6, 290–297 (1959). [21] Erdo˝s, P.; R´enyi, A.: The evolution of random graphs. Magyar Tud. Akad. Mat. Kutato´ Int. K¨ozl. 5: 17–61 (1960). [22] Johansson, K.: Universality of the local spacing distribution in certain ensembles of Hermitian Wigner matrices. Comm. Math. Phys. 215, no. 3., 683–705 (2001). [23] Johansson, K.: Universality for certain Hermitian Wigner matrices under weak moment conditions. Ann. Inst. H. Poincar´e Probab. Statist. 48, 47–79 (2012). [24] Khorunzhi, O.: High moments of large Wigner random matrices and asymptotic properties of the spectral norm. To appear in Rand. Op. Stoch. Eqs. [25] Knowles, A., Yin, J.: Eigenvector distribution of Wigner matrices. Preprint arXiv:1102.0057, to appear in Prob. Theor. Rel. Fields. [26] Miller, S. J., Novikoff, T., Sabelli, A.: The distribution of the largest nontrivial eigenvalues in families of random regular graphs. Exper. Math. 17, 231–244 (2008). 51 [27] Pastur, L., Shcherbina M.: Universality of the local eigenvalue statistics for a class of unitary invariant random matrix ensembles. J. Stat. Phys. 86, 109–147 (1997). [28] Pastur, L., Shcherbina M.: Bulk universality and related properties of Hermitian matrix models. J. Stat. Phys. 130, 205–250 (2008). [29] Ruzmaikina, A.: Universality of the edge distribution of eigenvalues of Wigner random matrices with polynomially decaying distributions of entries, Comm. Math. Phys. 261, no. 2, 277–296 (2006). [30] Sarnak, P.: Private communication. [31] Sinai, Y. and Soshnikov, A.: A refinement of Wigner’s semicircle law in a neighborhood of the spectrum edge. Functional Anal. and Appl. 32, no. 2, 114–131 (1998). [32] Sodin, S.: The Tracy–Widom law for some sparse random matrices. Preprint arXiv:0903.4295. [33] Soshnikov, A.: Universality at the edge of the spectrum in Wigner random matrices. Comm. Math. Phys. 207, no. 3, 697–733 (1999). [34] Tao, T. and Vu, V.: Random matrices: Universality of the local eigenvalue statistics, to appear in Acta Math., Preprint arXiv:0906.0510. [35] Tao, T. and Vu, V.: Random matrices: Universality of local eigenvalue statistics up to the edge. Preprint arXiv:0908.1982. [36] Tracy, C., Widom, H.: Level-Spacing Distributions and the Airy Kernel, Comm. Math. Phys. 159, 151–174 (1994). [37] Tracy, C., Widom, H.: On orthogonal and symplectic matrix ensembles, Comm. Math. Phys. 177, no. 3, 727–754 (1996). 52