arXiv:1103.3869v3 [math.PR] 25 Jul 2012

Spectral Statistics of Erdo˝s-R´enyi Graphs II: Eigenvalue Spacing and the Extreme Eigenvalues
La´szlo´ Erdo˝s1∗ Antti Knowles2† Horng-Tzer Yau2‡ Jun Yin2§
Institute of Mathematics, University of Munich, Theresienstrasse 39, D-80333 Munich, Germany
lerdos@math.lmu.de 1
Department of Mathematics, Harvard University Cambridge MA 02138, USA
knowles@math.harvard.edu htyau@math.harvard.edu jyin@math.harvard.edu 2
July 27, 2012
We consider the ensemble of adjacency matrices of Erd˝os-R´enyi random graphs, i.e. graphs on N vertices where every edge is chosen independently and with probability p ≡ p(N ). We rescale the matrix so that its bulk eigenvalues are of order one. Under the assumption pN ≫ N 2/3, we prove the universality of eigenvalue distributions both in the bulk and at the edge of the spectrum. More precisely, we prove (1) that the eigenvalue spacing of the Erd˝os-R´enyi graph in the bulk of the spectrum has the same distribution as that of the Gaussian orthogonal ensemble; and (2) that the second largest eigenvalue of the Erd˝os-R´enyi graph has the same distribution as the largest eigenvalue of the Gaussian orthogonal ensemble. As an application of our method, we prove the bulk universality of generalized Wigner matrices under the assumption that the matrix entries have at least 4 + ε moments.
AMS Subject Classiﬁcation (2010): 15B52, 82B44 Keywords: Erdo˝s-R´enyi graphs, universality, Dyson Brownian motion.
∗Partially supported by SFB-TR 12 Grant of the German Research Council †Partially supported by NSF grant DMS-0757425 ‡Partially supported by NSF grants DMS-0757425, 0804279 §Partially supported by NSF grant DMS-1001655
1

1. Introduction

The Erdo˝s-R´enyi ensemble [20, 21] is a law of a random graph on N vertices, in which each edge is chosen

independently with probability p ≡ p(N ). The corresponding adjacency matrix is called the Erdo˝s-R´enyi

matrix. Since each row and column has typically pN nonzero entries, the matrix is sparse as long as

p ≪ 1. We shall refer to pN as the sparseness parameter of the matrix. In the companion paper [11], we

established the local semicircle law for the Erdo˝s-R´enyi matrix for pN (log N )C , i.e. we showed that,

assuming pN (log N )C , the eigenvalue density is given by the Wigner semicircle law in any spectral

window containing on average at least (log N )C′ eigenvalues. In this paper, we use this result to prove

both the bulk and edge universalities for the Erdo˝s-R´enyi matrix under the restriction that the sparseness

parameter satisﬁes

pN ≫ N 2/3.

(1.1)

More precisely, assuming that p satisﬁes (1.1), we prove that the eigenvalue spacing of the Erdo˝s-R´enyi graph in the bulk of the spectrum has the same distribution as that of the Gaussian orthogonal ensemble (GOE). In order to outline the statement of the edge universality for the Erdo˝s-R´enyi graph, we observe that, since the matrix elements of the Erdo˝s-R´enyi ensemble are either 0 or 1, they do not satisfy the mean zero condition which typically appears in the random matrix literature. In particular, the largest eigenvalue of the Erdo˝s-R´enyi matrix is very large and lies far away from the rest of the spectrum. We normalize the Erdo˝s-R´enyi matrix so that the bulk of its spectrum lies in the interval [−2, 2]. By the edge universality of the Erdo˝s-R´enyi ensemble, we therefore mean that its second largest eigenvalue has the same distribution as the largest eigenvalue of the GOE, which is the well-known Tracy-Widom distribution. We prove the edge universality under the assumption (1.1).
Neglecting the mean zero condition, the Erdo˝s-R´enyi matrix becomes a Wigner random matrix with a Bernoulli distribution when 0 < p < 1 is a constant independent of N . Thus for p ≪ 1 we can view the Erdo˝s-R´enyi matrix, up to a shift in the expectation of the matrix entries, as a singular Wigner matrix for which the probability distributions of the matrix elements are highly concentrated at zero. Indeed, the probability for a single entry to be zero is 1 − p. Alternatively, we can express the singular nature of the Erdo˝s-R´enyi ensemble by the fact that the k-th moment of a matrix entry is bounded by

N −1(pN )−(k−2)/2 .

(1.2)

For p ≪ 1 this decay in k is much slower than in the case of Wigner matrices. There has been spectacular progress in the understanding of the universality of eigenvalue distributions
for invariant random matrix ensembles [5, 7, 8, 27, 28]. The Wigner and Erdo˝s-R´enyi matrices are not invariant ensembles, however. The moment method [31, 33, 32] is a powerful means for establishing edge universality. In the context of sparse matrices, it was applied in [32] to prove edge universality for the zero mean version of the d-regular graph, where the matrix entries take on the values −1 and 1 instead of 0 and 1. The need for this restriction can be ascribed to the two following facts. First, the moment method is suitable for treating the largest and smallest eigenvalues. But in the case of the Erdo˝s-R´enyi matrix, it is the second largest eigenvalue, not the largest one, which behaves like the largest eigenvalue of the GOE. Second, the modiﬁcation of the moment method to matrices with non-symmetric distributions poses a serious technical challenge.
A general approach to proving the universality of Wigner matrices was recently developed in the series of papers [12, 13, 14, 15, 16, 17, 18, 19]. In this paper, we further extend this method to cover sparse matrices such as the Erdo˝s-R´enyi matrix in the range (1.1). Our approach is based on the following three

2

ingredients. (1) A local semicircle law – a precise estimate of the local eigenvalue density down to energy scales containing around (log N )C eigenvalues. (2) Establishing universality of the eigenvalue distribution of Gaussian divisible ensembles, via an estimate on the rate of decay to local equilibrium of the Dyson Brownian motion [9]. (3) A density argument which shows that for any probability distribution of the matrix entries there exists a Gaussian divisible distribution such that the two associated Wigner ensembles have identical local eigenvalue statistics down to the scale 1/N . In the case of Wigner matrices, the edge universality can also be obtained by a modiﬁcation of (1) and (3) [19]. The class of ensembles to which this method applies is extremely general. So far it includes all (generalized) Wigner matrices under the sole assumption that the distributions of the matrix elements have a uniform subexponential decay. In this paper we extend this method to the Erdo˝s-R´enyi matrix, which in fact represents a generalization in two unrelated directions: (a) the law of the matrix entries is much more singular, and (b) the matrix elements have nonzero mean.
As an application of the local semicircle law for sparse matrices proved in [11], we also prove the bulk universality for generalized Wigner matrices under the sole assumption that the matrix entries have 4 + ε moments. This relaxes the subexponential decay condition on the tail of the distributions assumed in [17, 18, 19]. Moreover, we prove the edge universality of Wigner matrices under the assumption that the matrix entries have 12 + ε moments. These results on Wigner matrices are stated and proved in Section 7 below. We note that in [3] it was proved that the distributions of the largest eigenvalues are Poisson if the entries have at most 4 − ε moments. Numerical results [4] predict that the existence of four moments corresponds to a sharp transition point, where the transition is from the Poisson process to the determinantal point process with Airy kernel.
We remark that the bulk universality for Hermitian Wigner matrices was also obtained in [34], partly by using the result of [22] and the local semicircle law from Step (1). For real symmetric Wigner matrices, the bulk universality in [34] requires that the ﬁrst four moments of every matrix element coincide with those of the standard Gaussian random variable. In particular, this restriction rules out the real Bernoulli Wigner matrices, which may be regarded as the simplest kind of an Erdo˝s-R´enyi matrix (again neglecting additional diﬃculties arising from the nonzero mean of the entries).
As a ﬁrst step in our general strategy to prove universality, we proved, in the companion paper [11], a local semicircle law stating that the eigenvalue distribution of the Erdo˝s-R´enyi ensemble in any spectral window which on average contains at least (log N )C eigenvalues is given by the Wigner semicircle law. As a corollary, we proved that the eigenvalue locations are equal to those predicted by the semicircle law, up to an error of order (pN )−1. The second step of the strategy outlined above for Wigner matrices is to estimate the local relaxation time of the Dyson Brownian motion [15, 16]. This is achieved by constructing a pseudo-equilibrium measure and estimating the global relaxation time to this measure. For models with nonzero mean, such as the Erdo˝s-R´enyi matrix, the largest eigenvalue is located very far from its equilibrium position, and moves rapidly under the Dyson Brownian motion. Hence a uniform approach to equilibrium is impossible. We overcome this problem by integrating out the largest eigenvalue from the joint probability distribution of the eigenvalues, and consider the ﬂow of the marginal distribution of the remaining N − 1 eigenvalues. This enables us to establish bulk universality for sparse matrices with nonzero mean under the restriction (1.1). This approach trivially also applies to Wigner matrices whose entries have nonzero mean.
Since the eigenvalue locations are only established with accuracy (pN )−1, the local relaxation time for the Dyson Brownian motion with the initial data given by the Erdo˝s-R´enyi ensemble is only shown to be less than 1/(p2N ) ≫ 1/N . For Wigner ensembles, it was proved in [19] that the local relaxation time is of order 1/N . Moreover, the slow decay of the third moment of the Erdo˝s-R´enyi matrix entries, as given in (1.2), makes the approximation in Step (3) above less eﬀective. These two eﬀects impose the restriction (1.1) in our proof of bulk universality. At the end of Section 2 we give a more detailed account of how this
3

restriction arises. The reason for the same restriction’s being needed for the edge universality is diﬀerent; see Section 6.3. We note, however, that both the bulk and edge universalities are expected to hold without this restriction, as long as the graphs are not too sparse in the sense that pN ≫ log N ; for d-regular graphs this condition is conjectured to be the weaker pN ≫ 1 [30]. A discussion of related problems on d-regular graphs can be found in [26].
Acknowledgement. We thank P. Sarnak for bringing the problem of universality of sparse matrices to our attention.

2. Definitions and results

We begin this section by introducing a class of N × N sparse random matrices A ≡ AN . Here N is a large parameter. (Throughout the following we shall often refrain from explicitly indicating N -dependence.)
The motivating example is the Erdo˝s-R´enyi matrix, or the adjacency matrix of the Erdo˝s-R´enyi random
graph. Its entries are independent (up to the constraint that the matrix be symmetric), and equal to 1 with probability p and 0 with probability 1 − p. For our purposes it is convenient to replace p with the new parameter q ≡ q(N ), deﬁned through p = q2/N . Moreover, we rescale the matrix in such a way that its bulk
eigenvalues typically lie in an interval of size of order one. Thus we are led to the following deﬁnition. Let A = (aij) be the symmetric N × N matrix whose entries
aij are independent (up to the symmetry constraint aij = aji) and each element is distributed according to

aij

=

γ q

1 0

with

probability

q2 N

with

probability

1

−

q2 N

.

(2.1)

Here γ ..= (1 − q2/N )−1/2 is a scaling introduced for convenience. The parameter q N 1/2 expresses the sparseness of the matrix; it may depend on N . Since A typically has q2N nonvanishing entries, we ﬁnd that if q ≪ N 1/2 then the matrix is sparse.
We extract the mean of each matrix entry and write

A = H + γq |e e| ,

where the entries of H (given by hij = aij − γq/N ) have mean zero, and we deﬁned the vector

e

≡

eN

..=

√1 (1, . . . , 1)T . N

Here we use the notation |e e| to denote the orthogonal projection onto e, i.e. (|e e|)ij ..= N −1. One readily ﬁnds that the matrix elements of H satisfy the moment bounds

(2.2)

Eh2ij

=

1 N

,

E hij p

1 N qp−2

,

(2.3)

where p 2. More generally, we consider the following class of random matrices with non-centred entries characterized
by two parameters q and f , which may be N -dependent. The parameter q expresses how singular the distribution of hij is; in particular, it expresses the sparseness of A for the special case (2.1). The parameter f determines the nonzero expectation value of the matrix elements.

4

Definition 2.1 (H). We consider N ×N random matrices H = (hij ) whose entries are real and independent up to the symmetry constraint hij = hji. We assume that the elements of H satisfy the moment conditions

Ehij = 0 ,

E|hij |2

=

1 N

,

E|hij |p

Cp N qp−2

(2.4)

for 1 i, j N and 2 p (log N )10 log log N , where C is a positive constant. Here q ≡ q(N ) satisﬁes

(log N )15 log log N q CN 1/2

(2.5)

for some positive constant C.

Definition 2.2 (A). Let H satisfy Deﬁnition 2.1. Deﬁne the matrix A = (aij) through A ..= H + f |e e| ,

(2.6)

where f ≡ f (N ) is a deterministic number that satisﬁes

1 + ε0 f N C ,

(2.7)

for some constants ε0 > 0 and C.

Remark 2.3. For deﬁniteness, and bearing the Erdo˝s-R´enyi matrix in mind, we restrict ourselves to real symmetric matrices satisfying Deﬁnition 2.2. However, our proof applies equally to complex Hermitian sparse matrices.

Remark 2.4. As observed in [11], Remark 2.5, we may take H to be a Wigner matrix whose entries have subexponential decay E|hij |p (Cp)θpN −p/2 by choosing q = N 1/2(log N )−5θ log log N .
We shall use C and c to denote generic positive constants which may only depend on the constants in assumptions such as (2.4). Typically, C denotes a large constant and c a small constant. Note that the fundamental large parameter of our model is N , and the notations ≫, ≪, O(·), o(·) always refer to the limit N → ∞. Here a ≪ b means a = o(b). We write a ∼ b for C−1a b Ca.
After these preparations, we may now state our results. They concern the distribution of the eigenvalues of A, which we order in a nondecreasing fashion and denote by µ1 · · · µN . We shall only consider the distribution of the N − 1 ﬁrst eigenvalues µ1, . . . , µN−1. The largest eigenvalue µN lies far removed from the others, and its distribution is known to be normal with mean f + f −1 and variance N −1/2; see [11], Theorem 6.2, for more details.
First, we establish the bulk universality of eigenvalue correlations. Let p(µ1, . . . , µN ) be the probability density1 of the ordered eigenvalues µ1 · · · µN of A. Introduce the marginal density

p(NN−1)(µ1, . . . , µN−1)

..=

1 (N − 1)!

σ∈SN −1

dµN p(µσ(1), . . . , µσ(N−1), µN ) .

In other words, p(NN−1) is the symmetrized probability density of the ﬁrst N − 1 eigenvalues of H. For n N − 1 we deﬁne the n-point correlation function (marginal) through

p(Nn)(µ1, . . . , µn) ..= dµn+1 · · · dµN−1 p(NN−1)(µ1, . . . , µN−1) .

(2.8)

Similarly, we denote by p(GnO) E,N the n-point correlation function of the symmetrized eigenvalue density of an N × N GOE matrix.
1Note that we use the density of the law of the eigenvalue density for simplicity of notation, but our results remain valid when no such density exists.

5

Theorem 2.5 (Bulk universality). Suppose that A satisﬁes Deﬁnition 2.2 with q N φ for some φ

satisfying 0 < φ 1/2, and that f additionally satisﬁes f CN 1/2 for some C > 0. Let β > 0 and assume

that

φ

>

1 3

+

β 6

.

(2.9)

Let E ∈ (−2, 2) and take a sequence (bN ) satisfying N ε−β bN ||E| − 2|/2 for some ε > 0. Let n ∈ N and O : Rn → R be compactly supported and continuous. Then

lim E+bN dE′ N →∞ E−bN 2bN

dα1 · · · dαn O(α1, . . . , αn)

×

1 ̺sc(E)n

p(Nn) − p(GnO) E,N

E′

+

N

α1 ̺sc(E

)

,

.

.

.

,

E′

+

αn N ̺sc(E)

= 0,

where we abbreviated for the density of the semicircle law.

̺sc(E)

..=

1 2π

[4 − E2]+

(2.10)

Remark 2.6. Theorem 2.5 implies bulk universality for sparse matrices provided that 1/3 < φ the end of this section for an account on the origin of the condition (2.9).
We also prove the universality of the extreme eigenvalues.

1/2. See

Theorem 2.7 (Edge universality). Suppose that A satisﬁes Deﬁnition 2.2 with q N φ for some φ
satisfying 1/3 < φ 1/2. Let V be an N × N GOE matrix whose eigenvalues we denote by λV1 · · · λVN . Then there is a δ > 0 such that for any s we have

PV N 2/3(λVN − 2) s − N −δ −N −δ as well as

PA N 2/3(µN−1 − 2) s

PV N 2/3(λVN − 2) s + N −δ +N −δ (2.11)

PV N 2/3(λV1 + 2) s − N −δ − N −δ PA N 2/3(µ1 + 2) s

PV N 2/3(λV1 + 2) s + N −δ + N −δ ,

(2.12)

for N N0, where N0 is independent of s. Here PV denotes the law of the GOE matrix V , and PA the law

of the sparse matrix A.

Remark 2.8. Theorem 6.4 can be easily extended to correlation functions of a ﬁnite collection of extreme eigenvalues. Remark 2.9. The GOE distribution function F1(s) ..= limN PV N 2/3(λVN − 2) s of the largest eigenvalue of V has been identiﬁed by Tracy and Widom [36, 37], and can be computed in terms of Painlev´e equations. A similar result holds for the smallest eigenvalue λV1 of V .
Remark 2.10. A result analogous to Theorem 2.7 holds for the extreme eigenvalues of the centred sparse matrix H; see (6.15) below.
We conclude this section by giving a sketch of the origin of the restriction φ > 1/3 in Theorem 2.5. To simplify the outline of the argument, we set β = 0 in Theorem 2.5 and ignore any powers of N ε. The proof of Theorem 2.5 is based on an analysis of the local relaxation properties of the marginal Dyson Brownian

6

motion, obtained from the usual Dyson Brownian motion by integrating out the largest eigenvalue µN . As

an input, we need the bound

N −1
Q ..= E |µα − γα|2

N 1−4φ,

(2.13)

α=1

where γα denotes the classical location of the α-th eigenvalue (see (3.15) below). The bound (2.13) was proved in [11]. In that paper we prove, roughly, that |µα − γα| q−2 N −2φ, from which (2.13) follows.

The precise form is given in (3.16). We then take an arbitrary initial sparse matrix ensemble A0 and evolve it according to the Dyson Brownian motion up to a time τ = N −ρ, for some ρ > 0. We prove that the local

spectral statistics, in the ﬁrst N − 1 eigenvalues, of the evolved ensemble Aτ at time τ coincide with those

of a GOE matrix V , provided that

Qτ −1 = QN ρ ≪ 1 .

(2.14)

The precise statement is given in (4.9). This gives us the condition

1 − 4φ + ρ < 0 .

(2.15)

Next, we compare the local spectral statistics of a given Erdo˝s-R´enyi matrix A with those of the time-evolved

ensemble Aτ by constructing an appropriate initial A0, chosen so that the ﬁrst four moments of A and Aτ

are close. More precisely, by comparing Green functions, we prove that the local spectral statistics of A and

Aτ coincide if the ﬁrst three moments of the entries of A and Aτ coincide and their fourth moments diﬀer by at most N −2−δ for some δ > 0. (See Proposition 5.2.) Given A we ﬁnd, by explicit construction, a sparse

matrix A0 such that the ﬁrst three moments of the entries of Aτ are equal to those of A, and their fourth moments diﬀer by at most N −1−2φτ = N −1−2φ−ρ; see (5.6). Thus the local spectral statistics of A and Aτ

coincide provided that

1 − 2φ − ρ < 0 .

(2.16)

From the two conditions (2.15) and (2.16) we ﬁnd that the local spectral statistics of A and V coincide provided that φ > 1/3.

3. The strong local semicircle law and eigenvalue locations

In this preliminary section we collect the main notations and tools from the companion paper [11] that we shall need for the proofs. Throughout this paper we shall make use of the parameter

ξ ≡ ξN ..= 5 log log N ,

(3.1)

which will keep track of powers of log N and probabilities of high-probability events. Note that in [11], ξ was a free parameter. In this paper we choose the special form (3.1) for simplicity.
We introduce the spectral parameter z = E + iη
where E ∈ R and η > 0. Let Σ 3 be a ﬁxed but arbitrary constant and deﬁne the domain

DL ..= z ∈ C .. |E| Σ , (log N )LN −1 η 3 ,

(3.2)

7

with a parameter L ≡ L(N ) that always satisﬁes

L 8ξ .

(3.3)

For Im z > 0 we deﬁne the Stieltjes transform of the local semicircle law

msc(z) ..=

R

̺sc(x) x−z

dx ,

(3.4)

where the density ̺sc was deﬁned in (2.10). The Stieltjes transform msc(z) ≡ msc may also be characterized

as the unique solution of

msc

+

z

1 + msc

=

0

(3.5)

satisfying Im msc(z) > 0 for Im z > 0. This implies that

√

msc(z)

=

−z +

z2 2

−

4

,

(3.6)

where the square root is chosen so that msc(z) ∼ −z−1 as z → ∞. We deﬁne the resolvent of A through

G(z) ..= (A − z)−1 ,

as well as the Stieltjes transform of the empirical eigenvalue density

m(z)

..=

1 N

Tr G(z) .

For x ∈ R we deﬁne the distance κx to the spectral edge through κx ..= |x| − 2 .

(3.7)

At this point we warn the reader that we depart from our conventions in [11]. In that paper, the quantities G(z) and m(z) deﬁned above in terms of A bore a tilde to distinguish them from the same quantities deﬁned in terms of H. In this paper we drop the tilde, as we shall not need resolvents deﬁned in terms of H.
We shall frequently have to deal with events of very high probability, for which the following deﬁnition is useful. It is characterized by two positive parameters, ξ and ν, where ξ is given by (3.1).

Definition 3.1 (High probability events). We say that an N -dependent event Ω holds with (ξ, ν)-high

probability if

P(Ωc)

e−ν(log N )ξ

(3.8)

for N N0(ν). Similarly, for a given event Ω0, we say that Ω holds with (ξ, ν)-high probability on Ω0 if

P(Ω0 ∩ Ωc)

e−ν(log N )ξ

for N N0(ν).
Remark 3.2. In the following we shall not keep track of the explicit value of ν; in fact we allow ν to decrease from one line to another without introducing a new notation. All of our results will hold for ν ν0, where ν0 depends only on the constants C in Deﬁnition 2.1 and the parameter Σ in (3.2).

8

Theorem 3.3 (Local semicircle law [11]). Suppose that A satisﬁes Deﬁnition 2.2 with the condition

(2.7) replaced with

0 f NC .

(3.9)

Moreover, assume that

q (log N )120ξ , L 120ξ .

(3.10) (3.11)

Then there is a constant ν > 0, depending on Σ and the constants C in (2.4) and (2.5), such that the following holds.
We have the local semicircle law: the event

m(z) − msc(z)
z∈DL

(log N )40ξ

min

(√log N )40ξ κE + η

1 q2

,

1 q

+

1 Nη

(3.12)

holds with (ξ, ν)-high probability. Moreover, we have the following estimate on the individual matrix elements

of G. If instead of (3.9) f satisﬁes

0 f C0N 1/2 ,

(3.13)

for some constant C0, then the event

max Gij (z) − δij msc(z)
1 i,j N z∈DL
holds with (ξ, ν)-high probability.

(log N )40ξ

1 q

+

Im

msc(z) Nη

+

1 Nη

(3.14)

Next, we recall that the N − 1 ﬁrst eigenvalues of A are close the their classical locations predicted by

the semicircle law. Let nsc(E) ..=

E −∞

̺sc

(x)

dx

denote

the

integrated

density

of

the

local

semicircle

law.

Denote by γα the classical location of the α-th eigenvalue, deﬁned through

nsc(γα)

=

α N

for α = 1, . . . , N .

(3.15)

The following theorem compares the locations of the eigenvalues µ1, . . . , µN−1 to their classical locations γ1, . . . , γN−1.

Theorem 3.4 (Eigenvalue locations [11]). Suppose that A satisﬁes Deﬁnition 2.2, and let φ be an exponent satisfying 0 < φ 1/2, and set q = N φ. Then there is a constant ν > 0 – depending on Σ and the
constants C in (2.4), (2.5), and (2.7) – as well as a constant C > 0 such that the following holds.
We have with (ξ, ν)-high probability that

N −1
|µα − γα|2
α=1

(log N )Cξ N 1−4φ + N 4/3−8φ .

Moreover, for all α = 1, . . . , N − 1 we have with (ξ, ν)-high probability that

(3.16)

|µα − γα| (log N )Cξ N −2/3 α−1/3 + 1 α where we abbreviated α ..= min{α, N − α}.

(log N )Cξ(1 + N 1−3φ)

+ N 2/3−4φα−2/3 + N −2φ , (3.17)

9

Remark 3.5. Under the assumption φ 1/3 the estimate (3.17) simpliﬁes to

|µα − γα| (log N )Cξ N −2/3α−1/3 + N −2φ ,

(3.18)

which holds with (ξ, ν)-high probability.
Finally, we record two basic results from [11] for later reference. From [11], Lemmas 4.4 and 6.1, we get, with (ξ, ν)-high probability,

1

max
αN

|λα

|

2 + (log N )Cξ q−2 + N −2/3 ,

1 mα aNx−1|µα|

2 + (log N )Cξ q−2 + N −2/3 . (3.19)

Moreover, from [11], Theorem 6.2, we get, with (ξ, ν)-high probability,

µN

=

f

+

1 f

+

o(1) .

(3.20)

In particular, using (2.7) we get, with (ξ, ν)-high probability,

2 + σ µN N C ,

(3.21)

where σ > 0 is a constant spectral gap depending only on the constant ε0 from (2.7).

4. Local ergodicity of the marginal Dyson Brownian motion

In Sections 4 and 5 we give the proof of Theorem 2.5. Throughout Sections 4 and 5 it is convenient to
adopt a slightly diﬀerent notation for the eigenvalues of A. In these two sections we shall consistently use x1 · · · xN to denote the ordered eigenvalues of A, instead of µ1 · · · µN used in the rest of this paper. We abbreviate the collection of eigenvalues by x = (x1, . . . , xN ).
The main tool in the proof of Theorem 2.5 is the marginal Dyson Brownian motion, obtained from the
usual Dyson Brownian motion of the eigenvalues x by integrating out the largest eigenvalue xN . In this section we establish the local ergodicity of the marginal Dyson Brownian and derive an upper bound on its
local relaxation time. Let A0 = (aij,0)ij be a matrix satisfying Deﬁnition 2.2 with constants q0 N φ and f0 1 + ε0. Let
(Bij,t)ij be a symmetric matrix of independent Brownian motions, whose oﬀ-diagonal entries have variance t and diagonal entries variance 2t. Let the matrix At = (aij,t)ij satisfy the stochastic diﬀerential equation

daij

=

d√Bij N

−

1 2

aij

dt

.

(4.1)

It is easy to check that the distribution of At is equal to the distribution of

e−t/2A0 + (1 − e−t)1/2V ,

(4.2)

where V is a GOE matrix independent of A0.

Let ρ be a constant satisfying 0 < ρ < 1 to be chosen later. In the following we shall consider times t in

the interval [t0, τ ], where

t0 ..= N −ρ−1 ,

τ ..= N −ρ .

10

One readily checks that, for any ﬁxed ρ as above, the matrix At satisﬁes Deﬁnition 2.2, with constants

ft = f (1 + O(N −δ0 ))

1

+

ε0 2

,

qt ∼ q0 N φ ,

where all estimates are uniform for t ∈ [t0, τ ]. Denoting by xN,t the largest eigenvalue of At, we get in particular from (3.21) that

P ∃ t ∈ [t0, τ ] .. xN,t ∈/ [2 + σ, N C ]

e−ν(log N )ξ

(4.3)

for some σ > 0 and C > 0.
From now on we shall never use the symbols ft and qt in their above sense. The only information we shall need about xN is (4.3). In this section we shall not use any information about qt, and in Section 5 we shall only need that qt cN φ uniformly in t. Throughout this section ft will denote the joint eigenvalue density evolved under the Dyson Brownian motion. (See Deﬁnition 4.1 below.)
It is well known that the eigenvalues xt of At satisfy the stochastic diﬀerential equation (Dyson Brownian motion)

dxi

=

√dBi + N

−

1 4

xi

+

1 2N

j=i

1 xi − xj

dt

for i = 1, . . . , N ,

(4.4)

where B1, . . . , BN is a family of independent standard Brownian motions. In order to describe the law of V , we deﬁne the equilibrium Hamiltonian

H(x) ..=

1 4

x2i

−

1 N

log|xi − xj|

i i<j

(4.5)

and denote the associated probability measure by

µ(N ) (dx)

≡

µ(dx)

..=

1 Z

e−N

H(x)

dx

,

where Z is a normalization. We shall always consider the restriction of µ to the domain

ΣN ..= {x .. x1 < · · · < xN } ,

(4.6)

i.e. a factor 1(x ∈ ΣN ) is understood in expressions like the right-hand side of (4.6); we shall usually omit it. The law of the ordered eigenvalues of the GOE matrix V is µ.
Deﬁne the Dirichlet form Dµ and the associated generator L through

Dµ(f ) = −

f (Lf ) dµ

..=

1 2N

|∇f |2 dµ ,

(4.7)

where f is a smooth function of compact support on ΣN . One may easily check that

L=

1 2N

∂i2

+

ii

−

1 4

xi

+

1 2N

j=i

xi

1 − xj

∂i ,

and that L is the generator of the Dyson Brownian motion (4.4). More precisely, the law of xt is given by ft(x) µ(dx), where ft solves ∂tft = Lft and f0(x)µ(dx) is the law of x0.

11

Definition 4.1. Let ft to denote the solution of ∂tft = Lft satisfying ft|t=0 = f0. It is well known that this solution exists and is unique, and that ΣN is invariant under the Dyson Brownian motion, i.e. if f0 is supported in ΣN , so is ft for all t 0. For a precise formulation of these statements and their proofs, see e.g. Appendices A and B in [16]. In Appendix A, we present a new, simpler and more general, proof.
Theorem 4.2. Fix n 1 and let m = (m1, . . . , mn) ∈ Nn be an increasing family of indices. Let G .. Rn → R be a continuous function of compact support and set

Gi,m(x) ..= G N (xi − xi+m1 ), N (xi+m1 − xi+m2 ), . . . , N (xi+mn−1 − xi+mn ) .

Let γ1, . . . , γN−1 denote the classical locations of the ﬁrst N − 1 eigenvalues, as deﬁned in (3.15), and set

N −1
Q ..= sup
t∈[t0,τ ] i=1

(xi − γi)2ft dµ .

(4.8)

Choose an ε > 0. Then for any ρ satisfying 0 < ρ < 1 there exists a τ¯ ∈ [τ /2, τ ] such that, for any J ⊂ {1, 2, . . . , N − mn − 1}, we have

1 |J |

Gi,m fτ¯ dµ −

i∈J

1 |J |

Gi,m dµ(N−1)

i∈J

CNε

N 1+ρQ + N ρ |J |

(4.9)

for all N N0(ρ). Here µ(N−1) is the equilibrium measure of (N − 1) eigenvalues (GOE).

Note that, by deﬁnition, the observables Gi,m in (4.9) only depend on the eigenvalues x1, . . . , xN−1.

The rest of this section is devoted to the proof of Theorem 4.2. We begin by introducing a pseudo

equilibrium measure. Abbreviate

R

..=

√ τ N −ε

=

N −ρ/2−ε/2

and deﬁne

W (x) ..=

N

1 2R2

(xi

−

γi)2

.

i=1

Here we set γN ..= 2 + σ for convenience, but one may easily check that the proof remains valid for any larger

choice of γN . Deﬁne the probability measure

ω(dx) ..= ψ(x) µ(dx)

where ψ(x) ..= Z e−NW (x) . Z

Next, we consider marginal quantities obtained by integrating out the largest eigenvalue xN . To that end we write

x = (x, xN ) , x = (x1, . . . , xN−1)

and denote by ω(dx) the marginal measure of ω obtained by integrating out xN . By a slight abuse of notation, we sometimes make use of functions µ, ω, and ω, deﬁned as the densities (with respect to Lebesgue
measure) of their respective measures. Thus,

µ(x)

=

1 Z

e−N H(x)

,

ω(x) = 1 e−NH(x)−NW (x) , Z

∞

ω(x) =

ω(x, xN ) dxN .

xN −1

12

For any function h(x) we introduce the conditional expectation

h (x) ..= Eω[h|x] =

∞ xN −1

h(x, xN ) ω(x, xN ) dxN ω(x)

.

Throughout the following, we write gt ..= ft/ψ. In order to avoid pathological behaviour of the extreme eigenvalues, we introduce cutoﬀs. Let σ be the spectral gap from (4.3), and choose θ1, θ2, θ3 ∈ [0, 1] to be smooth functions that satisfy

θ1(x1) = θ2(xN−1) =
θ3(xN ) =

0 if x1 −4 , 1 if x1 −3

1 if xN−1 0 if xN−1

2

+

σ 5

2

+

2σ 5

,

0 if xN 1 if xN

2

+

3σ 5

2

+

4σ 5

.

Deﬁne θ ≡ θ(x1, xN−1, xN ) = θ1(x1) θ2(xN−1) θ3(xN ). One easily ﬁnds that

|∇θ|2 θ

C1(−4

x1

−3) + C1

σ 2

xN−1 − 2

2σ 5

+ C1

3σ 5

xN − 2

4σ 5

,

(4.10)

where the left-hand side is understood to vanish outside the support of θ. Deﬁne the density

ht

..=

1 Zt

θgt

,

Zt ..= θgt dω .

If ν is a probability measure and q a density such that qν is also a probability measure, we deﬁne the entropy

Sν(q) ..= q log q dν .

The following result is our main tool for controlling the local ergodicity of the marginal Dyson Brownian motion.

Proposition 4.3. Suppose that

(i) Sµ(ft0 ) N C ,

(ii) sup
t∈[t0,τ ]

1(x1

−3) + 1 xN−1

2

+

σ 5

+ 1 xN

(iii) sup sup (θ1θ2)(x) log θgt (x) 2
t∈[t0,τ ] x∈ΣN −1

NC .

2

+

4σ 5

ft dµ

e−ν(log N )ξ ,

(4.11) (4.12) (4.13)

Then for t ∈ [t0, τ ] we have

∂tSω( h ) −Dω h + Sω( h ) e−c(log N)ξ + CN QR−4 + C .

(4.14)

13

Proof. First we note that

Zt = θft dµ = 1 − O e−ν(log N)ξ

(4.15)

uniformly for t ∈ [t0, τ ], by (4.12). Dropping the time index to avoid cluttering the notation, we ﬁnd

∂tSω( h ) = ∂t

θg log θg dω − ∂t log Z = 1 ∂t ZZ

θg log θg dω − 1 + log Z + Sω( h ) ∂t log Z .

We ﬁnd that

∂tZ =

θ(Lf ) dµ

=

−

1 2N

∇θ · ∇f dµ

1 N

1/2
|∇θ|2 f dµ Dµ( f )1/2 .

Bounding the Dirichlet form in terms of the entropy (see e.g. [10], Theorem 3.2), we ﬁnd that

Dµ( ft) by (4.11). Using (4.10) we therefore ﬁnd

2 t

Sµ

(ft0

)

NC ,

(4.16)

∂tZ

N C e−c(log N )ξ .

(4.17)

Thus we have

∂tSω( h ) 2∂t θg log θg dω + 1 + Sω( h ) N C e−c(log N)ξ .

(4.18)

We therefore need to estimate ∂t θg log θg dω =
The second term of (4.19) is given by

θ(Lf ) log θg dµ +

θg

∂t θg θg

dω .

(4.19)

∂t θg dω = θ(Lf ) dµ = ∂tZ .

Therefore (4.18) yields

∂tSω( h ) 2 θ(Lf ) log θg dµ + (1 + Sω( h ))N C e−c(log N)ξ .

The ﬁrst term of (4.20) is given by

−

1 N

∇f · ∇ θ log θg

dµ

=

−

1 N

∇(θf ) · ∇ log θg dµ + E1 + E2 ,

where we deﬁned

E1

..=

1 N

∇θ · ∇(log θg ) f dµ ,

E2

..=

−

1 N

∇θ · ∇f log θg dµ .

(4.20) (4.21)

14

Next, we estimate the error terms E1 and E2. Using (4.10) we get

E1 =

1 N

∇√θ

· ∇(log

θg

√ ) θf

dµ

θ

e−ν(log N )ξ

|∇ θg |2 θg 2

θg

dω

|∇θ|2 θ

f

dµ

1/2

|∇ θg |2 θg 2

θf

dµ

1/2

1/2

e−c(log N )ξ + e−c(log N )ξ Dω

h,

where we used (4.15). Similarly, we ﬁnd

E2

1/2
|∇θ|2 log θg 2 f dµ

|∇f |2 f

dµ

1/2
.

Using (4.10), (4.13), and (4.16) we therefore get

E2 N C

|∇θ|2 θ

θ

log

θg

2f dµ

1/2

N C e−c(log N )ξ .

Having dealt with the error terms E1 and E2, we compute the ﬁrst term on the right-hand side of (4.21),

−

1 N

∇(θf ) · ∇ log θg

dµ

=

−

1 N

∇x(θg) · ∇x log θg

ψ

dµ

−

1 N

∇x(log ψ) · ∇x log θg θgψ dµ .

(4.22)

The second term of (4.22) is bounded by

η−1 N

|∇x

log

ψ|2

f

dµ

+

η N

|∇ θg |2 θg 2

θg

dω

where η > 0. The ﬁrst term of (4.22) is equal to

η−1N

1 R4

N −1
(xi

−

γi)2

f

dµ

+

4ηDω(

i=1

NQ ηR4

+

4ηDω (

h ),

h)

A simple calculation shows that

−

1 N

∇x(θg) · ∇x log θg dω .

∇x(θg) = ∇x θg − θg∇x log ω + θg ∇x log ω , so that the ﬁrst term of (4.22) becomes

−

1 N

∇x θg · ∇x log θg

dω

+

1 N

θg∇x log ω − θg ∇x log ω · ∇x log θg dω

−4(1 − η)Dω

h

+

1 Nη

θg∇x log ω − θg θg

∇x log ω

2
dω .

15

Using the Cauchy-Schwarz inequality ab 2 a2 b2 we ﬁnd that the second term is bounded by

1 θg ∇x log ω − ∇x log ω Nη θg
Thus, we have to estimate

2

dω

1 Nη

=

1 Nη

=

1 Nη

θg ∇x log ω − ∇x log ω 2 dω

∇x log ω − ∇x log ω 2 θf dµ

N −1 i=1

xN

1 −

xi

−

1 xN − xi

2
θf dµ .

E3

..=

1 Nη

N −1 i=1

1 xN − xi

2
θf dµ ,

E4

..=

1 Nη

N −1 i=1

1 xN − xi

Since xN − xi σ/5 on the support of θf µ, one easily gets from (3.19) that

E3

C η

.

2
θf dµ .

In order to estimate E4, we write
1 xN − xi

=

dxN (xN − xi) wi(xN ) dxN wi(xN )

−1
,

where

wi(xN ) ..= 1(xN

x ) eN−1

−

N 4

x2N

−

N 2R2

(xN

−γN

)2

(xN − xj ) .

j=i,N

We now claim that on the support of θ, in particular for −4 x1 < xN−1 2 + 2σ/5, we have

dxN (xN − xi) wi(xN ) dxN wi(xN )

c γN ,

uniformly for x ∈ ΣN−1. Indeed, writing γN ..= γN (1 + R−2), we have on the support of θ

(4.23)

dxN (xN − xi) wi(xN ) dxN wi(xN )

γN /2 +

dxN

(xN − γN ) wi(xN dxN wi(xN )

)

.

Moreover, the second term is nonnegative:

∞

dxN (xN − γN ) wi(xN ) = −CN (x)

dxN

xN −1

∂ ∂xN

e−

N R2

(xN −γN )2

(xN − xj)
j=i,N

=

CN

(x)

e−

N R2

(xN

−1−γN

)2

(xN−1 − xj )

+ CN (x)

j=i,N

∞

dxN

e−

N R2

(xN −γN )2

(xN − xj)

xN −1

k=i,j j=i,k,N

0,

16

where CN (x) is nonnegative. This proves (4.23). Using (4.23) we get

E4

C η

γN−2 f dµ

=

C η

.

Summarizing, we have proved that

∂tSω( h )

− 4 − 8η − e−c(log N)ξ Dω

h

+

1 + Sω(

h

)

e−c(log N )ξ

+

NQ ηR4

+

C η

.

Choosing η small enough completes the proof.

Next, we derive a logarithmic convexity bound for the marginal measure ω.

Lemma 4.4. We have that where

ω(x) = 1 e−NH(x) , Z

H(x)

=

−

1 N

log|xi − xj | + V (x) ,

i<j<N

(4.24)

and ∇2V (x) R−2.

Proof. Write H(x, xN ) = H′(x) + H′′(x, xN ) where

H′(x)

..=

−

1 N

log|xi − xj| ,

i<j<N

H′′(x, xN )

..=

−

1 N

log|xN − xi| +

i<N

i

1 2R2

(xi

−

γi)2

.

By deﬁnition, we have

ω(x) = 1 e−NH′(x) Z

∞
e−N H′′(x,xN ) dxN .
xN −1

The main tool in our proof is the Brascamp-Lieb inequality [6]. In order to apply it, we need to extend the integration over xN to R and replace the singular logarithm with a C2-function. To that end, we introduce the approximation parameter δ > 0 and deﬁne, for x ∈ ΣN−1,

Vδ (x)

..=

−

1 N

log

exp

logδ (xN

−

xi)

−

N 2R2

(xi − γi)2 dxN ,

i<N

i

where we deﬁned

logδ(x) ..= 1(x

δ) log x + 1(x < δ)

log

δ

+

x

− δ

δ

−

1 2δ2

(x

−

δ)2

.

It is easy to check that logδ ∈ C2(R), is concave, and satisﬁes

lim
δ→0

logδ

(x)

=

log x if x > 0 −∞ if x 0 .

17

Thus we ﬁnd that Vδ ∈ C2(ΣN−1) and that we have the pointwise convergence, for all x ∈ ΣN−1,

lim Vδ(x) =
δ→0

V (x) ..=

−

1 N

log

∞
e−N H′′(x,xN ) dxN ,
xN −1

where V ∈ C2(ΣN−1) satisﬁes (4.24). Next, we claim that if ϕ = ϕ(x, y) satisﬁes ∇2ϕ(x, y) K then ψ(x), deﬁned by

e−ψ(x) ..= e−ϕ(x,y) dy ,

satisﬁes ∇2ψ(x) K. In order to prove the claim, we use subscripts to denote partial derivatives and recall the Brascamp-Lieb inequality for log-concave functions (Equation 4.7 in [6])

ψxx

ϕxx − ϕxyϕ−yy1ϕyx e−ϕ dy

e−ϕ dy .

Then the claim follows from

ϕxx ϕyz

ϕxy −1 ϕyy

1 K

=⇒

ϕxx − ϕxyϕ−yy1ϕyx

K.

Using this claim, we ﬁnd that ∇2Vδ(x) R−2 for all x ∈ ΣN−1. In order to prove that ∇2V (x) R−2 –
and hence complete the proof – it suﬃces to consider directional derivatives and prove the following claim. If (ζδ)δ>0 is a family of functions on a neighbourhood U that converges pointwise to a C2-function ζ as δ → 0, and if ζδ′′(x) K for all δ > 0 and x ∈ U , then ζ′′(x) K for all x ∈ U . Indeed, taking δ → 0 in

h

ζδ(x + h) + ζδ(x − h) − 2ζδ(x) =

ζδ′′(x + ξ) + ζδ′′(x − ξ) (h − ξ) dξ

0

K h2

yields ζ(x + h) + ζ(x − h) − 2ζ(x) h−2 K, from which the claim follows by taking the limit h → 0.

As a ﬁrst consequence of Lemma 4.4, we derive an estimate on the expectation of observables depending only on eigenvalue diﬀerences.

Proposition 4.5. Let q ∈ L∞(dω) be probability density. Then for any J ⊂ {1, 2, . . . , N − mn − 1} and any t > 0 we have

1 |J |

Gi,m q dω −

i∈J

1 |J |

Gi,m dω

i∈J

C

Dω(√q) t |J |

+

C

Sω(q) e−ct/R2 .

Proof. Using Lemma 4.4, the proof of Theorem 4.3 in [16] applies with merely cosmetic changes.

Another, standard, consequence of Lemma 4.4 is the logarithmic Sobolev inequality Sω(q) CR2Dω(√q) .

(4.25)

Using (4.25) and Proposition 4.3, we get the following estimate on the Dirichlet form.

18

Proposition 4.6. Under the assumptions of Proposition 4.3, there exists a τ¯ ∈ [τ /2, τ ] such that

Sω( hτ¯ ) CN R−2Q + CR2 ,

Dω( hτ¯ ) CN R−4Q + C .

Proof. Combining (4.25) with (4.14) yields

∂tSω( ht ) −CR−2Sω( ht ) + CN QR−4 + C ,

which we integrate from t0 to t to get

Sω( ht )

e−CR−2(t−t0)Sω( ht0 ) + CN QR−2 + CR2 .

Moreover, (4.15) yields

(4.26) (4.27)

Sω( ht0 )

CSω ( gt0 ) + e−ν(log N)ξ

CSω(gt0 ) + e−ν(log N)ξ = CSµ(ft0 ) − C

log ψ ft0 dµ + e−ν(log N)ξ ,

where the second inequality follows from the fact that taking marginals reduces the relative entropy; see the proof of Lemma 4.7 below for more details. Thus we get

Sω( ht0 ) N C + N R−2Q N C .

Thus (4.27) yields

Sω( ht )

N C e−CR−2(t−t0) + CN R−2Q + CR2

(4.28)

for t ∈ [t0, τ ]. Integrating (4.14) from τ /2 to τ therefore gives

2 τ

τ
Dω (
τ /2

ht ) dt

CN R−4Q + C ,

and the claim follows.

We may ﬁnally complete the proof of Theorem 4.2.

Proof of Theorem 4.2. The assumptions of Proposition 4.3 are veriﬁed in Subsection 4.1 below. Hence Propositions 4.5 and 4.6 yield

1 |J |

Gi,m hτ¯ dω −

i∈J

Using (4.15) and (4.12) we get

1 |J |

Gi,m dω

i∈J

CNε

N 1+ρQ |J |

+

C

N −2φ−ρ |J |

.

1 |J |

Gi,m fτ¯ dµ −

i∈J

1 |J |

Gi,m dω

i∈J

CNε

N 1+ρQ |J |

+

C

N −2φ−ρ |J |

.

(4.29)

19

In order to compare the measures ω and µ(N−1), we deﬁne the density

q(x)

..=

1 Z′

exp

1 4

x2i

+

N 2R2

(xi

−

γi)2

−

log|xN − xi| ,

i<N

i<N

i<N

where Z′ is a normalization chosen so that θq dω is a probability measure. It is easy to see that

q dω = dµ(N−1) ⊗ dg ,

where

dg

=

C

e−

N 4

x2N

−

N 2R2

(xN −γN )2 dxN

is

a

Gaussian

measure.

Similarly

to

Proposition

4.5,

we

have

1 |J |

Gi,m θq dω −

i∈J

1 |J |

Gi,m dω

i∈J

√

C

Dω( θq)τ |J |

+C

Sω(θq) e−cτ /R2 .

Thus we have to estimate

Dω( θq)

C N

|∇ log

q|2

θq

dω

+

C N

|∇θ|2 θ

q

dω

C N
i<N

1 4

x2i

+

N2 R4

(xi

−

γi)2

+

(xN

1 −

xi)2

θq

dω

+

1 N

C + N R−4

(xi − γi)2 dµ(N−1)
i<N

where the second inequality follows from standard large deviation results for GOE. Since

i<N (xi −

γi)2 dµ(N−1) CN −1+ε′ for arbitrary ε′ is known to hold for GOE (see [19] where this is proved for more

general Wigner matrices), we ﬁnd

1 |J |

Gi,m θq dω −

i∈J

1 |J |

Gi,m dω

i∈J

C

N −ρ |J |

+C

N ρ+2ε+ε′ |J |

.

The cutoﬀ θ can be easily removed using the standard properties of dµ(N−1). Choosing ε′ = ε, replacing ε with ε/2, and recalling (4.29) completes the proof.

4.1. Verifying the assumptions of Proposition 4.3. The estimate (4.11) is an immediate consequence of the following lemma.

Lemma 4.7. Let the entries of A0 have the distribution ζ0. Then for any t > 0 we have Sµ(ft) N 2(N m2(ζ0) − log 1 − e−t) ,

where m2(ζ0) is the second moment of ζ0.

Proof. Recall that the relative entropy is deﬁned, for ν ≪ µ, as S(ν|µ) ..=

log

dν dµ

dν.

If ν

and µ are

marginals of ν and µ with respect to the same variable, it is easy to check that S(ν|µ) S(ν|µ). Therefore

Sµ(ft) = S(ftµ|µ) S(At|V ) = N 2S(ζt|g2/N ) ,

20

where ζt denotes the law of the oﬀ-diagonal entries of At, and gλ is a standard Gaussian with variance λ (the diagonal entries are dealt with similarly). Setting γ = 1 − e−t, we ﬁnd from (4.2) that ζt has probability density ̺γ ∗ g2γ/N , where ̺γ is the probability density of (1 − γ)1/2ζ0. Therefore Jensen’s inequality yields

S(ζt|g2/N ) = S dy ̺γ(y) g2γ/N (· − y) g2/N

dy ̺γ (y)S g2γ/N (· − y)|g2/N .

By explicit computation one ﬁnds

S g2γ/N (· − y)|g2/N

=

1 2

N 2

y2

−

log

γ

+

γ

−

1

.

Therefore

and the claim follows.

S(ζt|g2/N ) N m2(ζ0) − log γ ,

The estimate (4.12) follows from (4.3) and (3.19). It only remains to verify (4.13). Lemma 4.8. For any t ∈ [t0, τ ] we have
(θ1θ2)(x) log θgt (x) 2 N C .

(4.30)

Proof. Let ζt be the law of an oﬀ-diagonal entry a of At (the diagonal entries are treated similarly). From (4.2) we ﬁnd

ζt = ̺γ ∗ g2γ/N ,

where γ = 1 − e−t, ̺γ is the law of (1 − γ)1/2ζ0, and gλ is a standard Gaussian with variance λ. Using da to denote Lebesgue measure, we ﬁnd by explicit calculation that

which gives

e−N C −N C a2

dζt

eN

C

−

N 4

a2

,

da

e−N C −N C a2

dζt dg2γ/N

eNC .

Therefore, the density Ft(A) of the law of A with respect to the GOE measure satisﬁes

e−N C −N C Tr A2

Ft(A)

eNC .

Parametrizing A = A(x, v) using the eigenvalues x and eigenvectors v, the GOE measure can be written in the factorized form µ(dx)P (dv), where µ is deﬁned in (4.6) and P is a probability measure. Thus we get that the density

ft(x) = Ft(x, v) P (dv)

21

satisﬁes

e−N C −N C i x2i

ft(x)

eNC .

Next, it is easy to see that

e−N C −N C i x2i

ψ(x)

eNC .

Using (4.32) we may now derive an upper bound on θ3gt :

θ3gt (x) =

dxN θ3(xN )ft(x, xN )µ(x, xN ) dxN ψ(x, xN )µ(x, xN )

eN C +N C

i<N x2i

dxN µ(x, xN ) dxN e−NCx2N µ(x, xN

)

.

Since

dxN e−NCx2N µ(x, xN ) = dxN µ(x, xN )

∞ xN −1

dxN

e−N C x2N

i<N (xN

−

xi

)e−

N 4

x2N

∞ xN −1

dxN

i<N (xN

−

xi

)e−

N 4

x2N

by a straightforward calculation, we get

e−N C −N C i<N x2i

θ3gt (x)

e .N C +N C i<N x2i

We now derive a lower bound on θ3gt . Using (4.32) and (4.31) we ﬁnd

θ3gt (x)

e−NC dxN θ3(xN )ft(x, xN )µ(x, xN ) dxN µ(x, xN )

e−N C −N C

i<N x2i

∞ 2+σ/2

dxN

e−NC x2N µ(x,

xN )

∞ xN −1

dxN

µ(x,

xN

)

e ,−N C −N C i<N x2i

by a calculation similar to (4.33). The claim follows from

(θ1θ2)(x)|log θgt (x)|2 2(θ1θ2)(x)|log θ1θ2|2 + 2(θ1θ2)(x)|log θ3gt (x)|2 2 + N C .

(4.31) (4.32)
(4.33)

5. Bulk universality: proof of Theorem 2.5
Similarly to (2.8), we deﬁne p(t,NN−1)(x1, . . . , xN−1) as the probability density obtained by symmetrizing (in the variables x1, . . . , xN−1) the function dxN ft(x) µ(x), and set, for n N − 1,
p(t,nN) (x1, . . . , xn) ..= dxn+1 · · · dxN−1 p(t,NN−1)(x1, . . . , xN−1) . We begin with a universality result for sparse matrices with a small Gaussian convolution.

22

Theorem 5.1. Let E ∈ set τ ..= N −2α+β, where

[−2 + κ, 2 − κ]

for

some

κ>

0

and

let

b≡

bN

satisfy

|b|

α ≡ α(φ) ..= min

2φ

−

1 2

,

4φ

−

2 3

.

κ/2. Pick ε, β > 0, and (5.1)

Let n ∈ N and O : Rn → R be compactly supported and continuous. Then there is a τ¯ ∈ [τ /2, τ ] such that

E+b dE′ E−b 2b

dα1

·

·

·

dαn

O(α1,

.

.

.

,

αn)

1 ̺sc(E)n

pτ(¯n,N) − p(GnO) E,N

E′

+

N

α1 ̺sc(E)

,

.

.

.

,

E′

+

αn N ̺sc(E)

CnN ε b−1N −2φ + b−1/2N −β/2 . (5.2)

Proof. The claim follows from Theorem 4.2 and Theorem 3.4, similarly to the proof of Theorem 2.1 in [16]. We use that Q (log N )CξN −2α, as follows from (3.16); the contribution of the low probability complement event to (3.16) may be easily estimated using Cauchy-Schwarz and the estimate i Etx4i = Et Tr A4 N C , uniformly for t 0. The assumption IV of [16] is a straightforward consequence of the local semicircle law, Theorem 3.3.
Proposition 5.2. Let A(1) = (a(ij1)) and A(2) = (a(ij2)) be sparse random matrices, both satisfying Deﬁnition 2.2 with

q(1) ∼ q(2)

Nφ

(in self-explanatory notation). Suppose that, for each i, j, the ﬁrst three moments of a(ij1) and a(ij2) are the same, and that the fourth moments satisfy

E a(ij1) 4 − E a(ij2) 4

N −2−δ ,

(5.3)

for some δ > 0. Let n ∈ N and let F ∈ C5(Cn). We assume that, for any multi-index α ∈ Nn with 1 |α| 5 and any
suﬃciently small ε′ > 0, we have

max ∂αF (x1, . . . , xn) .. |xi| N ε′
i

N C0ε′ , max ∂αF (x1, . . . , xn) .. |xi| N 2
i

N C0 ,

where C0 is a constant.

Let κ > 0 be arbitrary. Choose a sequence of positive integers k1, . . . , kn and real parameters Ejm ∈ [−2 +

κ, 2 − κ], where m = 1, . . Set zjm ..= Ejm ± iη with

. , n and j = an arbitrary

1, . . . , km. Let ε choice of the ±

> 0 be signs.

arbitrary

and

choose

η

with

N −1−ε

η

N −1.

Then, abbreviating G(l)(z) ..= (A(l) − z)−1, we have





 

EF

1 N k1

Tr 

k1

G(1)(zj1) , . . . ,

1 N kn

kn
Tr 

G(1)(zjn) − EF

G(1) → G(2)

j=1

j=1

CN 1−3φ+Cε + CN −δ+Cε .

23

Proof. The proof of Theorem 2.3 in [17] may be reproduced almost verbatim; the rest term in the Green function expansion is estimated by an L∞-L1 bound using E|a(ijl)|5 CN −1−3φ.
As in [17] (Theorem 6.4), Proposition 5.2 readily implies the following correlation function comparison theorem.
Theorem 5.3. Suppose the assumptions of Proposition 5.2 hold. Let p((n1)),N and p((n2)),N be n-point correlation functions of the eigenvalues of A(1) and A(2) respectively. Then for any |E| < 2, any n 1 and any compactly supported test function O : Rn → R we have

lim
N →∞

dα1 · · · dαn O(α1, . . . , αn) p((n1)),N − p((n2)),N

E

+

α1 N

,.

.

.

,E

+

αn N

= 0.

We may now complete the proof of Theorem 2.5.

Proof of Theorem 2.5. In order to invoke Theorems 5.1 and 5.3, we construct a sparse matrix A0, satisfying Deﬁnition 2.2, such that its time evolution Aτ¯ is close to A in the sense of the assumptions of Proposition 5.2. For deﬁniteness, we concentrate on oﬀ-diagonal elements (the diagonal elements are dealt with similarly).
For the following we ﬁx i < j; all constants in the following are uniform in i, j, and N . Let ξ, ξ′, ξ0 be random variables equal in distribution to aij, (aτ¯)ij , (a0)ij respectively. For any random variable X we use the notation X ..= X − EX. Abbreviating γ ..= 1 − e−τ¯, we have
ξ′ = 1 − γ ξ0 + √γ g ,

where g is a centred Gaussian with variance 1/N , independent of ξ0. We shall construct a random variable
ξ0, supported on at most three points, such that A0 satisﬁes Deﬁnition 2.2 and the ﬁrst four moments of ξ′ are suﬃciently close to those of ξ. For k = 1, 2, . . . we denote by mk(X) the k-th moment of a random
variable X. We set

ξ0

=

√ 1

f −

γ

N

+ ξ0 ,

(5.4)

where m1(ξ0) = 0 and m2(ξ0) = N −1. It is easy to see that mk(ξ) = mk(ξ′) for k = 1, 2. We take the law of ξ0 to be of the form

pδa + qδ−b + (1 − p − q)δ0

where a, b, p, q

0 are parameters satisfying p + q 1. The conditions m1(ξ0) = 0 and m2(ξ0) = N −1 imply

p

=

1 aN (a

+

b)

,

q

=

bN

1 (a

+

b)

.

Thus, we parametrize ξ0 using a and b; the condition p + q 1 reads ab N −1. Our aim is to determine a and b so that ξ0 satisﬁes (2.4), and so that the third and fourth moments of ξ′ and ξ are close. By explicit
computation we ﬁnd

m3(ξ0)

=

a− N

b

,

m4(ξ0)

=

N m3(ξ0)2

+

ab N

.

(5.5)

24

Now we require that a and b be chosen so that ab N −1 and

m3(ξ0) = (1 − γ)−3/2m3(ξ) ,

m4(ξ0) = N m3(ξ0)2 + m4(ξ) − N m3(ξ)2 .

Using (5.5), it is easy to see that such a pair (a, b) exists provided that m4(ξ) − N m3(ξ)2 N −2. This latter estimate is generally valid for any random variable with m1 = 0; it follows from the elementary inequality m4m2 − m23 m32 valid whenever m1 = 0.
Next, using (5.5) and the estimates m3(ξ) = O(N −1−φ), m4(ξ) = O(N −1−2φ), we ﬁnd
a − b = O(N −φ) , ab = O(N −2φ) ,

which implies a, b = O(N −φ). We have hence proved that A0 satisﬁes Deﬁnition 2.2. One readily ﬁnds that m3(ξ′) = m3(ξ). Moreover, using

m4(ξ0) − m4(ξ) = N m3(ξ)2 (1 − γ)−3 − 1 = O(N −1−2φγ) ,

we ﬁnd

m4(ξ′) − m4(ξ)

=

(1

−

γ)2m4(ξ0)

+

6γ N2

−

3γ2 N2

− m4(ξ)

=

O(N −1−2φγ) .

Summarizing, we have proved

mk(ξ′) = mk(ξ) (k = 1, 2, 3), |m4(ξ′) − m4(ξ)| CN −1−2φτ¯ .

(5.6)

The claim follows now by setting δ = 2α(φ) + 2φ − 1 − β in (5.3), and invoking Theorems 5.1 and 5.3.

6. Edge universality: proof of Theorem 2.7

6.1. Rank-one perturbations of the GOE. We begin by deriving a simple, entirely deterministic, result on
the eigenvalues of rank-one perturbations of matrices. We choose the perturbation to be proportional to |e e|, but all results of this subsection hold trivially if e is replaced with an arbitrary ℓ2-normalized vector.

Lemma 6.1 (Monotonicity and interlacing). Let H be a symmetric N × N matrix. For f 0 we set A(f ) ..= H + f |e e| .

Denote by λ1 · · · λN the eigenvalues of H, and by µ1(f ) · · · µN (f ) the eigenvalues of A(f ). Then for all α = 1, . . . , N − 1 and f 0 the function µα(f ) is nondecreasing, satisﬁes µα(0) = λα, and has the

interlacing property

λα µα(f ) λα+1 .

(6.1)

Proof. From [11], Equation (6.3), we ﬁnd that µ is an eigenvalue of H + f |e e| if and only if

α

| uα , e |2 µ − λα

=

1 f

,

(6.2)

where uα is the eigenvector of H associated with the eigenvalue λα. The right-hand side of (6.2) has N singularities at λ1, . . . , λN , away from which it is decreasing. All claims now follow easily.

25

Next, we establish the following “eigenvalue sticking” property for GOE. Let α label an eigenvalue close to the right (say) spectral edge. Roughly we prove that, in the case where H = V is a GOE matrix and f > 1, the eigenvalue µα of V + f |e e| “sticks” to λα+1 with a precision (log N )CξN −1. This behaviour can be interpreted as a form of long-distance level repulsion, in which the eigenvalues µβ, β < α, repel the eigenvalue µα and push it close to its maximum possible value, λα+1.
Lemma 6.2 (Eigenvalue sticking). Let V be an N × N GOE matrix. Suppose moreover that ξ satisﬁes (3.10) and that f satisﬁes f 1 + ε0. Then there is a δ ≡ δ(ε0) > 0 such that for all α satisfying N (1 − δ) α N − 1 we have with (ξ, ν)-high probability

Similarly, if α instead satisﬁes α

|λα+1 − µα|

(log N )Cξ N

.

N δ we have with (ξ, ν)-high probability

(6.3)

|µα − λα|

(log N )Cξ N

.

(6.4)

For the proof of Lemma 6.2 we shall need the following result about Wigner matrices, proved in [19].

Lemma 6.3. Let H be a Wigner matrix with eigenvalues λ1 · · · λN and associated eigenvectors

u1, . . . , uN . Assume that ξ is given by (3.1). Then the following two statements hold with (ξ, ν)-high

probability:

max
α

uα

∞

(log√N )Cξ , N

(6.5)

and

|λα − γα| (log N )CξN −2/3 min{α, N + 1 − α} −1/3 .

(6.6)

Moreover, let L satisfy (3.11) and write GHij (z) ..= (H − z)−1 ij . Then we have, with (ξ, ν)-high probability,

1

max
i,j N

GHij (z)

− δij msc(z)

z∈DL

(log N )Cξ

Im

msc(z) Nη

+

1 Nη

,

(6.7)

where DL was deﬁned in (3.2)

Proof of Lemma 6.2. We only prove (6.3); the proof of (6.4) is analogous. By orthogonal invariance of V , we may replace e with the vector (1, 0, . . . , 0). Let us abbreviate ζβ ..= |uβ(1)|2. Note that (6.5) implies

max ζβ
β

(log N )CξN −1

(6.8)

with (ξ, ν)-high probability. Now from we (6.2) we get

µα

ζα − λα+1

+
β=α+1

ζβ µα − λβ

=

1 f

,

which yields

|λα+1 − µα|

=

ζα

ζβ β=α+1 λβ − µα

+

1 f

−1
.

(6.9)

26

We estimate from below, introducing an arbitrary η > 0,

−

β=α+1

λβ

ζβ −

µα

=

ζβ β<α+1 µα − λβ

−

β>α+1

λβ

ζβ −

µα

β<α+1

ζβ(µα − λβ) (µα − λβ )2 + η2

−
β>α+1

ζβ λβ − µα

=

−

Re

GV11(µα

+

iη)

+

β>α+1

ζβ(λβ − µα) (λβ − µα)2 + η2

−

β>α+1

λβ

ζβ −

µα

− Re GV11(µα

+

iη)

−

β>α+1

ζβ η2 (λβ − µα)3

,

(6.10)

where in the third step we used that λα+1 µα by (6.1). We now choose η = (log N )C1 log log N N −1. For C1 large enough, we get from (6.7) that GV11(µα + iη) =
msc(µα + iη) + o(1). Therefore (3.6) yields

− Re GV11(µα + iη) 1 − 2 |2 − µα| + o(1) .

(6.11)

From (6.6) and (6.1) we get that |µα − γα| (log N )CξN −2/3 with (ξ, ν)-high probability. Moreover, the deﬁnition (3.15) and α N (1 − δ) imply |γα − 2| Cδ2/3. Thus we get, with (ξ, ν)-high probability, that |2 − µα| = o(1) + Cδ2/3. Therefore (6.11) yields, with (ξ, ν)-high probability,

− Re GV11(µα + iη) 1 + o(1) − Cδ1/3 .

Recalling (6.8), we therefore get from (6.10), with (ξ, ν)-high probability,

ζβ β=α+1 λβ − µα

1

+

o(1)

−

C δ 1/3

−

m(log N )Cξ N 3|λα+1 − µα|3

−

(log N )Cξ N3

β>α+m

|λβ

1 − µα|3

,

(6.12)

for any m ∈ N. Next, from (6.6) we ﬁnd that, provided C2 is large enough, m ..= (log N )C2ξ, and β > α + m, then we
have with (ξ, ν)-high probability

|λβ − λα+1|

|γβ

−

γα+1 |

−

(log N )Cξ N 2/3(N + 1 − β)1/3

c|γβ − γα+1| .

Then for C2 large enough we have, with (ξ, ν)-high probability,

1 β>α+m |λβ − µα|3

C

β>α+m

(γβ

1 − γα+1)3

CN3 (log N )3C2ξ

.

Thus we get from (6.12), with (ξ, ν)-high probability,

ζβ β=α+1 λβ − µα

1

+

o(1)

−

C δ 1/3

−

(log N )Cξ N 3|λα+1 − µα|3

.

27

Plugging this into (6.9) and recalling that f

|λα+1 − µα|

(log N )Cξ N

1 + ε0 > 1 yields, with (ξ, ν)-high probability,

ε0

−

C δ 1/3

−

o(1)

−

(log N )Cξ N 3|λα+1 − µα|2

−1
,

from which the claim follows.

6.2. Proof of Theorem 2.7. In this section we prove Theorem 2.7 by establishing the following comparison result for sparse matrices. Throughout the following we shall abbreviate the lower bound in (2.7) by

f∗ ..= 1 + ε0 .

(6.13)

Proposition 6.4. Let Pv and Pw be laws on the symmetric N × N random matrices H, each satisfying

Deﬁnition 2.1 with q N φ for some φ satisfying 1/3 < φ 1/2. In particular, we have the moment

matching condition

Evhij = Ewhij = 0 ,

Ev h2ij

=

Ew h2ij

=

1 N

.

(6.14)

Set f ..= f∗ in Deﬁnition 2.2: A ≡ A(f∗) = (aij ) ..= H +f∗|e e|. As usual, we denote the ordered eigenvalues

of H by λ1 · · · λN and the ordered eigenvalues of A by µ1 · · · µN .

Then there is a δ > 0 such that for any s ∈ R we have

Pv N 2/3(λN − 2) s − N −δ − N −δ

Pw N 2/3(λN − 2) s

Pv N 2/3(λN − 2) s + N −δ + N −δ (6.15)

as well as

Pv N 2/3(µN−1 − 2) s − N −δ − N −δ

Pw N 2/3(µN−1 − 2) s

Pv N 2/3(µN−1 − 2) s + N −δ + N −δ (6.16)

for N N0 suﬃciently large, where N0 is independent of s.
Assuming Proposition 6.4 is proved, we may easily complete the proof of Theorem 2.7 using the results of Section 6.1.

Proof of Theorem 2.7. Choose Pv to be the law of GOE (see Remark 2.4), and choose Pw to be the law of a sparse matrix satisfying Deﬁnition 2.1 with q N φ. We prove (2.11); the proof of (2.12) is similar.
For the following we write µα(f ) ≡ µα to emphasize the f -dependence of the eigenvalues of A(f ). Using ﬁrst (6.1) and then (6.15) we get

Pw N 2/3(µN−1(f ) − 2) s

Pw N 2/3(λN − 2) s

Pv N 2/3(λN − 2) s − N −δ − N −δ ,

for some δ > 0. Next, using ﬁrst the monotonicity of µα(f ) from Lemma 6.1, then (6.16), and ﬁnally (6.3), we get

Pw N 2/3(µN−1(f ) − 2) s

Pw N 2/3(µN−1(f∗) − 2) s

Pv N 2/3(µN−1(f∗) − 2) s + N −δ + N −δ Pv N 2/3(λN − 2)

for some δ > 0. This concludes the proof of (2.11), after a renaming of δ.

s + 2N −δ + 2N −δ ,

28

The rest of this section is devoted to the proof of Proposition 6.4. We shall only prove (6.16). The proof of (6.15) is similar (in fact easier), and relies on the local semicircle law, Theorem 3.3, with f = 0; if f = 0 some of the following analysis simpliﬁes (e.g. the proof of Lemma 6.8 below may be completed without the estimate from Lemma 6.9.)
From now on we always assume the setup of Proposition 6.4. In particular, f will always be equal to f∗. We begin by outlining the proof of Proposition 6.4. The basic strategy is similar to the one used for Wigner matrices in [19] and [25]. For any E1 E2, let
N (E1, E2) ..= {α .. E1 µα E2}

denote the number of eigenvalues of A in the interval [E1, E2]. In the ﬁrst step, we express the distribution function in terms of Green functions according to

E∗

Pu µN−1 E = EuK N (E, ∞) − 2 ≈ EuK N (E, E∗) − 1 ≈ EuK

dy N Im m(y + iη) − 1 .

E

(6.17) Here u stands for either v or w, η ..= N −2/3−ε for some ε > 0 small enough, K : R → R+ is a smooth cutoﬀ

function satisfying

K(x) = 1 if |x| 1/9 and K(x) = 0 if |x| 2/9 ,

(6.18)

and

E∗ ..= 2 + 2(log N )C0ξN −2/3

(6.19)

for some C0 large enough. The ﬁrst approximate identity in (6.17) follows from Theorem 3.4 which guarantees that µN−1 E∗ with (ξ, ν)-high probability, and from (3.21) which guarantees that µN 2 + σ with (ξ, ν)high probability. The second approximate identity in (6.17) follows from the approximation

E2
dy N Im m(y + iη) =
E1

α

E2 E1

dy

(y

−

η µα)2

+

η2

≈ N (E1, E2) ,

which is valid for E1 and E2 near the spectral edge, where the typical eigenvalue separation is N −2/3 ≫ η. The second step of our proof is to compare expressions such as the right-hand side of (6.17) for u = v and
u = w. This is done using a Lindeberg replacement strategy and a resolvent expansion of the argument of
K. This step is implemented in Section 6.3, to which we also refer for a heuristic discussion of the argument.
Now we give the rigorous proof of the steps outlined in (6.17). We ﬁrst collect the tools we shall need. From (3.18) and (3.21) we get that there is a constant C0 > 0 such that, under both Pv and Pw, we have with (ξ, ν)-high probability

|N 2/3(µN−1 − 2)| (log N )C0ξ , µN 2 + σ ,

(6.20)

and

N

2−

2(log N

N )C0
2/3

ξ

,

2+

2(log N )C0ξ N 2/3

Therefore in (6.16) we can assume that s satisﬁes

(log N )2C0ξ .

(6.21)

− (log N )C0ξ s (log N )C0ξ.

(6.22)

29

Recall the deﬁnition (6.19) of E∗ and and introduce, for any E

interval [E, E∗],

χE ..= 1[E,E∗] .

E∗, the characteristic function on the

For any η > 0 we deﬁne

θη (x)

..=

η π(x2 + η2)

=

1 π

Im

x

1 −

iη

(6.23)

to be an approximate delta function on scale η. The following result allows us to replace the sharp counting function N (E, E∗) = Tr χE(H) with its
approximation smoothed on scale η.

Lemma 6.5. Suppose that E satisﬁes

|E − 2|N 2/3 (log N )C0ξ.

(6.24)

Let

ℓ

..=

1 2

N

−2/3−ε

and

η ..= N −2/3−9ε,

and

recall

the

deﬁnition

of

the

function

K

from

(6.18).

Then

the

following statements hold for both ensembles Pv and Pw. For some ε > 0 small enough the inequalities

Tr(χE+ℓ ∗ θη)(H) − N −ε N (E, ∞) − 1 Tr(χE−ℓ ∗ θη)(H) + N −ε

(6.25)

hold with (ξ, ν)-high probability. Furthermore, we have

E K Tr(χE−ℓ ∗ θη)(H)

P(N (E, ∞) = 1) E K Tr(χE+ℓ ∗ θη)(H) + e−ν(log N)ξ

(6.26)

for suﬃciently large N independent of E, as long as (6.24) holds.

Proof. The proof of Corollary 6.2 in [19] can be reproduced almost verbatim. In the estimate (6.17) of [19], we need the bound, with (ξ, ν)-high probability,

m(E + iℓ) − msc(E + iℓ)

(log N )Cξ Nℓ

for N −1+c ℓ N −2/3. This is an easy consequence of the local semicircle law (3.12) and the assumption q N 1/3.
Note that, when compared to Corollary 6.2 in [19], the quantity N (E, ∞) has been incremented by one;
the culprit is the single eigenvalue µN 2 + σ.

Recalling

that

θη (H )

=

1 π

Im

G(iη),

Lemma

6.5

bounds

the

probability

of

N (E, ∞)

=

1

in

terms

of

expectations of functionals of Green functions. We now show that the diﬀerence between the expectations of

these functionals, with respect to the two probability distributions Pv and Pw, is negligible assuming their

associated second moments of hij coincide. The precise statement is the following Green function comparison

theorem at the edge. All statements are formulated for the upper spectral edge 2, but with the same proof

they hold for the lower spectral edge −2 as well.

For the following it is convenient to introduce the shorthand

Iε ..= {x .. |x − 2| N −2/3+ε}

(6.27)

where ε > 0.

30

Proposition 6.6 (Green function comparison theorem on the edge). Suppose that the assumptions of Proposition 6.4 hold. Let F : R → R be a function whose derivatives satisfy

sup |F (n)(x)|(1 + |x|)−C1 C1 for n = 1, 2, 3, 4 ,
x

(6.28)

with some constant C1 > 0. Then there exists ε < ε and for any real numbers E, E1, E2 ∈ Iε,

a constant and setting

ε η

> 0, depending only on ..= N −2/3−ε, we have

C1,

such

that

for

any

EvF (N η Im m(z)) − EwF (N η Im m(z))

CN 1/3+Cεq−1

for z = E + iη ,

(6.29)

and

E2 E2

EvF N

dy Im m(y + iη) − EwF N

dy Im m(y + iη)

E1 E1

N 1/3+Cεq−1

(6.30)

for some constant C and large enough N .

We postpone the proof of Proposition 6.6 to the next section. Assuming it proved, we now have all the ingredients needed to complete the proof of Proposition 6.4.

Proof of Proposition 6.4. As observed after (6.20) and (6.21), we may assume that (6.22) holds. We deﬁne E := 2 + sN −2/3 that satisﬁes (6.24). We deﬁne E∗ as in (6.19) with the C0 such that (6.20) and
(6.21) hold. From (6.26) we get, for any suﬃciently small ε > 0,

Ew K Tr(χE−ℓ ∗ θη)(H)

Pw(N (E, ∞) = 1)

(6.31)

where we set

ℓ

..=

1 2

N

−2/3−ε

,

η ..= N −2/3−9ε.

Now (6.30) applied to the case E1 = E − ℓ and E2 = E∗ shows that there exists a δ > 0 such that for suﬃciently small ε > 0 we have

EvK Tr(χE−ℓ ∗ θη)(H)

Ew K Tr(χE−ℓ ∗ θη)(H) + N −δ

(6.32)

(note that here 9ε plays the role of ε in the Proposition 6.6). Next, the second bound of (6.26) yields

Pv(N (E − 2ℓ, ∞) = 1) Ev K Tr(χE−ℓ ∗ θη)(H) + e−ν(log N)ξ

(6.33)

Combining these inequalities, we have

Pv(N (E − 2ℓ, ∞) = 1) Pw(N (E, ∞) = 1) + 2N −δ

(6.34)

for suﬃciently small ε > 0 and suﬃciently large N . Setting E = 2 + sN −2/3 proves the ﬁrst inequality of (6.16). Switching the roles of v and w in (6.34) yields the second inequality of (6.16).

31

6.3. Proof of Proposition 6.6. All that remains is the proof of Proposition 6.6, to which this section is
devoted. Throughout this section we suppose that the assumptions of Proposition 6.4 hold, and in particular
that f = 1 + ε0. We now set up notations to replace the matrix elements one by one. This step is identical for the proof
of both (6.29) and (6.30); we use the notations of the case (6.29), for which they are less involved.
For the following it is convenient to slightly modify our notation. We take two copies of our probability space, one of which carries the law Pv and the other the law Pw. We work on the product space and write Hv for the copy carrying the law Pv and Hw for the copy carrying the law Pw. The matrices Av and Aw are deﬁned in the obvious way, and we use the notations Av = (vij ) and Aw = (wij ) for their entries. Similarly, we denote by Gv(z) and Gw(z) the Green functions of the matrices Av and Aw.
Fix a bijective ordering map on the index set of the independent matrix elements,

φ .. {(i, j) .. 1 i j N } → 0, . . . , γmax

where

γmax

..=

N (N + 2

1)

−

1

,

and denote by Aγ the generalized Wigner matrix whose matrix elements aij follow the v-distribution if

φ(i, j) γ and they follow the w-distribution otherwise; Next, set η ..= N −2/3−ε. We use the identity

in

particular

A0

=

Av

and

Aγmax

=

Aw.

Im msc(E + iη)

|E − 2| + η CN −1/3+ε/2 .

(6.35)

Therefore Theorem 2.9 of [11] yields, with (ξ, ν)-high probability,

max max max
0 γ γmax 1 k,l N E∈Iε

1 Aγ − E − iη

− δklmsc(E + iη)
kl

1 p

where we deﬁned

1 p

..=

Nε

q−1

+

1 Nη

N −1/3+2ε .

We set z = E + iη where E ∈ Iε and η = N −2/3−ε. Using (6.36), (6.37), and the identity

Im m

=

1 N

Im Tr G

=

η N

Gij Gij ,

ij

(6.36) (6.37)

we ﬁnd, as in (6.36) of [19], that in order to prove (6.29) it is enough to prove 

EF η2 Gvij Gvji − EF (Gv → Gw)
i=j

CN 1/3+Cεq−1

(6.38)

at z = E + iη. We write the quantity in the absolute value on the left-hand side of (6.38) as a telescopic sum,



E F η2
i=j

1 Av − z ij

1 Av − z

 − E F (Av → Aw)
ji

γmax

=−

E F (Av → Aγ ) − E F (Av → Aγ−1) . (6.39)

γ =2

32

Let E(ij) denote the matrix whose matrix elements are zero everywhere except at the (i, j) position, where it is 1, i.e. Ek(ilj) ..= δikδjl. Fix γ 1 and let (b, d) be determined by φ(b, d) = γ. For deﬁniteness, we assume the oﬀ-diagonal case b = d; the case b = d can be treated similarly. Note that the number of diagonal terms is N and the number of oﬀ-diagonal terms is O(N 2). We shall compare Aγ−1 with Aγ for each γ and then sum up the diﬀerences in (6.39).
Note that these two matrices diﬀer only in the entries (b, d) and (b, d), and they can be written as

Aγ−1 = Q + V

where V ..= (vbd − Evbd)E(bd) + (vdb − Evdb)E(db) ,

(6.40)

and Aγ = Q + W
where the matrix Q satisﬁes

where W ..= (wbd − Ewbd)E(bd) + (wdb − Ewdb)E(db) ,

Qbd = Qdb = f /N = Evbd = Evdb = Ewbd = Ewdb ,

where, we recall f = 1 + ε0. It is easy to see that

max
i,j

|vij |

+

max
i,j

|wij

|

(log N )Cξq−1

(6.41)

with (ξ, ν)-high probability, and that

Evij = Ewij = 0 ,

E(vij )2 = E(wij )2 C/N ,

E|vij |k + E|wij |k

for k = 2, 3, 4, 5, 6. We deﬁne the Green functions

R

..=

Q

1 −

z

,

S

..=

1 Aγ−1

−

z

,

T

..=

1 Aγ −

z

.

We now claim that the estimate (6.36) holds for the Green function R as well, i.e.

max max Rkl(E + iη) − δklmsc(E + iη)
1 k,l N E∈Iε

p−1

holds with (ξ, ν)-high probability. To see this, we use the resolvent expansion

CN −1q2−k

(6.42) (6.43) (6.44)

R = S + SV S + (SV )2S + . . . + (SV )9S + (SV )10R.

(6.45)

Since V has only at most two nonzero elements, when computing the entry (k, l) of this matrix identity,

each term is a sum of ﬁnitely many terms (i.e. the number of summands is independent of N ) that involve

matrix elements of S or R and vij, e.g. of the form (SV S)kl = Skivij Sjl + SkjvjiSil. Using the bound (6.36) for the S matrix elements, the bound (6.41) for vij and the trivial bound |Rij| η−1 N , we get (6.44).
Having introduced these notations, we may now give an outline of the proof of Proposition 6.6. We have

to estimate each summand of the telescopic sum (6.39) with b = d (the generic case) by o(N −2); in the

non-generic case b = d, a bound of size o(N −1) suﬃces. For simplicity, assume that we are in the generic

case b = d and that F has only one argument. Fix z = E + iη, where E ∈ Iε (see (6.27)) and η ..= N −2/3−ε.

Deﬁne

yS ..= η2 Sij (z)Sji(z) ;

(6.46)

i=j

33

the random variable yR is deﬁned similarly. We shall show that

EF (yS) = B + EF (yR) + O(N −1/3+Cεp−4q−1) ,

(6.47)

for some deterministic B which depends only on the law of Q and the ﬁrst two moments of vbd. From (6.47) we immediately conclude that (6.29) holds. In order to prove (6.47), we expand

F (yS) − F (yR)

=

F ′(yR)(yS

−

yR)

+

1 2

F

′′

(yR

)(yS

−

yR)2

+

1 6

F

′′′

(ζ

)(yS

−

yR)3

,

(6.48)

where ζ lies between yS and yR. Next, we apply the resolvent expansion

S = R + RV R + (RV )2R + . . . + (RV )mR + (RV )m+1S

(6.49)

to each factor S in (6.48) for some m 2. Here we only concentrate on the linear term in (6.48). The second term is dealt with similarly. (The rest term in (6.48) requires a diﬀerent treatment because F ′′′(ζ) is not
independent of vbd. It may however by estimated cheaply using a naive power counting.) By deﬁnition, Q is independent of vbd, and hence F ′(yR) and R are independent of hbd. Therefore the expectations of the ﬁrst and second order terms (in the variable vbd) in EF ′(yR)(yS − yR) are put into B. The third order terms in EF ′(yR)(yS − yR) are bounded, using a naive power counting, by

η2N 2p−3E|vbd|3

N −4/3N 2p−3N −1q−1 .

(6.50)

Here we used that, thanks to the assumption i = j, in the generic terms {i, j} ∩ {b, d} = ∅ there are at least
three oﬀ-diagonal matrix elements R in the resolvent expansion of (6.46). Indeed, since b, d ∈/ {i, j}, the
terms of order greater than one in (6.49) have at least two oﬀ-diagonal resolvents matrix elements, and other factor in (6.46) has at least one oﬀ-diagonal resolvent matrix element since i = j. Thus we get a factor p−3 by (6.36) (the non-generic terms are suppressed by a factor N −1). Note that the bound (6.50) is still too large compared to N 2, since p N −1/3. The key observation to solve this problem is that the expectation of the leading term is much smaller than its typical size; this allows us to gain an additional factor p−1. A
similar observation was used in [19], but in the present case this estimate (performed in Lemma 6.8 below) is
substantially complicated by the non-vanishing expectation of the entries of A. Much of the heavy notation
in the following argument arises from the need to keep track of the non-generic terms, which have fewer
oﬀ-diagonal elements than the generic terms, but have a smaller entropy factor. The improved bound on the diﬀerence EF ′(yR)(yS − yR) is

N −4/3N 2p−4N −1q−1 = N −1/3p−4q−1 ,

which is much smaller than N −2 provided that q N φ for φ > 1/3 and ε is small enough. The key step to the proof of Proposition 6.6 is the following lemma.

Lemma 6.7. Fix an index γ = φ(b, d) and recall the deﬁnitions of Q, R and S from (6.43). For any small

enough ε > 0 and under the assumptions in Proposition 6.6, there exists a constant C depending on F (but

independent of γ) and constants BN and DN , depending on the law law(Q) of the Green function Q and on the second moments m2(vbd) of vbd, such that, for large enough N (independent of γ) we have



E2 E2

EF η

dy SijSji(y + iη) − EF η

dy Rij Rji(y + iη) − BN m2(vbd), law(Q)

E1 i=j

E1 i=j

N 1(b=d)−5/3+Cεq−1 , (6.51)

34

where, we recall, η = N −2/3−ε, as well as  



E F η2 Sij Sji(z) − E F η2 Rij Rji(z) − DN m2(vbd), law(Q)

N 1(b=d)−5/3+Cεq−1 ,

i=j i=j

(6.52)

where z = E +iη. The constants BN and DN may also depend on F , but they depend on the centered random variable vbd only through its second moments.

Assuming Lemma 6.7, we now complete the proof of Proposition 6.6.

Proof of Proposition 6.6. Clearly, Lemma 6.7 also holds if S is replaced by T . Since Q is independent of vbd and wbd, and m2(vbd) = m2(wbd) = 1/N , we have DN m2(vbd), law(Q) = DN m2(wbd), law(Q) . Thus we get from Lemma 6.7 that
  

E F η2 Sij Sji(z) − E F η2 Tij T ji(z)
i=j i=j

CN 1(b=d)−5/3+Cεq−1.

(6.53)

Recalling the deﬁnitions of S and T from (6.43), the bound (6.53) compares the expectation of a function of the resolvent of Aγ and that of Aγ−1. The telescopic summation in (6.39) then implies (6.38), since the number of summands with b = d is of order N 2 but the number of summands with b = d is only N . Similarly, (6.51) implies (6.30). This completes the proof.

Proof of Lemma 6.7. Throughout the proof we abbreviate Aγ−1 = A = (aij ) where aij = hij + f /N . We shall only prove the more complicated case (6.51); the proof of (6.52) is similar. In fact, we shall prove the bound

 EF η



E2
dy

SijSji(y + iη) − EF

E1 i=j

E2
η dy Rij Rji(y + iη)
E1 i=j

− BN m2(hbd), law(Q)

CN 1(b=d)−1/3+Cεp−4q−1 , (6.54)

from which (6.51) follows by (6.37). From (6.36) we get max max Skl(E + iη) − δklmsc(E + iη)
1 k,l N E∈Iε

p−1

(6.55)

with (ξ, ν)-high probability. Deﬁne Ω as the event on which (6.55), (6.44), and (6.41) hold. We have proved that Ω holds with (ξ, ν)-high probability. Since the arguments of F in (6.54) are bounded by CN 2+2ε and F (x) increases at most polynomially, it is easy to see that the contribution of the event Ωc to the expectations
in (6.54) is negligible. Deﬁne xS and xR by

xS ..= η

E2
dy

Sij Sji(y + iη) ,

E1 i=j

xR ..= η

E2
dy

Rij Rji(y + iη),

E1 i=j

(6.56)

35

and decompose xS into three parts

xS = xS2 + xS1 + xS0

where xSk ..= η

E2
dy

1 |{i, j} ∩ {b, d}| = k SijSji(y + iη) ;

E1 i=j

(6.57)

xRk is deﬁned similarly. Here k = |{i, j} ∩ {b, d}| is the number of times the indices b and d appear among the summation indices i, j. Clearly k = 0, 1 or 2. The number of the terms in the sum of the deﬁnition of
xSk is O(N 2−k). A resolvent expansion yields

S = R − RV R + (RV )2R − (RV )3R + (RV )4R − (RV )5R + (RV )6S .

(6.58)

In the following formulas we shall, as usual, omit the spectral parameter from the notation of the resolvents.
The spectral parameter is always y + iη with y ∈ [E1, E2]; in particular, y ∈ Iε.
If |{i, j} ∩ {b, d}| = k, recalling that i = j we ﬁnd that there are at least 2 − k oﬀ-diagonal resolvent elements in (RV )mR ij , so that (6.36) yields in Ω

(RV )mR ij

Cm N εq−1 m p−(2−k)

where m ∈ N+ , m 6 , k = 0, 1, 2 .

(6.59)

Similarly, we have in Ω

(RV )mS ij

Cm N εq−1 m p−(2−k)

where m ∈ N+ , m 6 , k = 0, 1, 2 .

(6.60)

Therefore we have in Ω that

|xSk − xRk |

CN 2/3−kp−(3−k)N εq−1

for k = 0, 1, 2 ,

(6.61)

where the factor N 2/3−k comes from i=j, η and dE. Inserting these bounds into the Taylor expansion of F , using

q N φ N 1/3+Cε p N 1/3−2ε

(6.62)

and keeping only the terms larger than O(N −1/3+Cεp−4q−1), we obtain

E F (xS ) − F (xR) − E

F ′(xR)(xS0

−

xR0 )

+

1 2

F

′′

(xR

)(xS0

−

xR0 )2

+

F ′(xR)(xS1

−

xR1 )

CN −1/3+Cεp−4q−1 ,

(6.63)

where we used the remark after (6.55) to treat the contribution on the event Ω. Since there is no x2 appearing in (6.63), we can focus on the cases k = 0 and k = 1.
To streamline the notation, we introduce

Ri(jm) ..= (−1)m (RV )mR ij .

(6.64)

Then using (6.59) and the estimate maxi=j |Rij| p−1 we get

Ri(jm)

Cm N εq−1 m p−(2−k)+δ0mδ0k .

(6.65)

36

Now we decompose the sum xSk − xRk according to the number of matrix elements hbd and hdb. To that end, for k ∈ {0, 1} and s, t ∈ {0, 1, 2, 3, 4} and s + t 1, we deﬁne

E2

Q(ks,t) ..= η

dy 1 |{i, j} ∩ {b, d}| = k Ri(js)Rj(ti) ,

E1 i=j

and set

Q(kℓ) ..=

Q(ks,t) .

s+t=ℓ

Using (6.65) we get the estimates, valid on Ω,

(6.66) (6.67)

Q(ks,t)

Cst N εq−1 s+t N p2/3−k −(4−2k)+δ0sδ0k+δ0tδ0k ,

Q(kℓ)

where ℓ 1. Using (6.62), (6.59), and (6.60), we ﬁnd the decomposition

Cℓ N εq−1 ℓ N 2/3−kp−(3−k) , (6.68)

xSk − xRk =

Q(ks,t) + O(N −1/3+Cεp−4q−1) ,

1 s+t 4

(6.69)

where s and t are non-negative. By (6.68) and (6.44) we have for s + t 1

Ebd Q(ks,t)

Cstq2−s−tN −1/3−kp−(4−2k)+δ0sδ0k+δ0tδ0k ,

(6.70)

where Ebd denotes partial expectation with respect to the variable hbd. Here we used that only terms with at least two elements hbd or hdb survive. Recalling (6.42), we ﬁnd that taking the partial expectation Ebd improves the bound (6.68) by a factor q2/N . Thus we also have

Ebd Q(kℓ)

Cℓ q2−ℓN −1/3−kp−(3−k)

(6.71)

Similarly, for s + t 1 and u + v 1 we have

Ebd Q(ks,t)Q(ku,v)

q N p ,2−s−t−u−v 1/3−2k −(8−4k)+δ0sδ0k +δ0tδ0k +δ0uδ0k +δ0v δ0k

(6.72)

which implies

Ebd Q(kℓ1)Q(kℓ2)

q2−ℓ1−ℓ2 N (1/3−2k)+Cεp−6+2k .

(6.73)

Inserting (6.71) and (6.73) into the second term of the left-hand side of (6.63), and using the assumption F as well as (6.62), we ﬁnd

E

F ′(xR)(xS0

−

xR0 )

+

F ′(xR)(xS1

−

xR1 )

+

1 2

F

′′ (xR )(xS0

−

xR0 )2

= B + EF ′(xR)Q(03) + O N −1/3+Cεp−4q−1

= B + EF ′(xR) Q(00,3) + Q(03,0) + O N −1/3+Cεp−4q−1 ,

(6.74)

37

where we deﬁned



B ..= E 

F ′(xR) Q(k1) + Q(k2)

k=0,1


+

1 2

F

′′(xR

)

Q(01)

2



= E

F ′(xR)Ebd Q(k1) + Q(k2)

+

1 2

F

′′ (xR )Ebd

Q(10)

2 .

k=0,1

(6.75)

Note that B depends on hbd only through its expectation (which is zero) and on its second moment. Thus,
B will be BN (m2(vbd), law(Q)) from (6.51). In order to estimate (6.74), it only remains to estimate EF ′(xR)Q(00,3) and EF ′(xR)Q(03,0). Using (6.70),
(6.63), (6.74), and (6.71), we have

E F (xS) − F (xR) − B

N −1/3+Cεp−3q−1 ,

(6.76)

which implies (6.54) in the case b = d. Let us therefore from now on assume b = d. Since we estimate Q(03,0) and Q(00,3), this implies that i, j, b, d
are all distinct. In order to enforce this condition in sums, it is convenient to introduce the indicator function χ ≡ χ(i, j, b, d) ..= 1 |{i, j, b, d}| = 4 .
Recalling (6.64), we introduce the notation Ri(jm,s) to denote the sum of the terms in the deﬁnition (6.64) of Ri(jm) in which the number of the oﬀ-diagonal elements of R is s. For example,

Ri(j3,0) = Ri(j3,1) = 0 ,

Ri(j3,2) = RibhbdRddhdbRbbhbdRdj + RidhdbRbbhbdRddhdbRbj .

(6.77)

Then in the case χ = 1 we have Now from the deﬁnition (6.66) we get

4

Ri(j3) =

Ri(j3,s)

s=2

(6.78)

4

Q(00,3) =

Q0(0,3,s)

s=2

where

E2

Q0(0,3,s) ..= η

dy χ Ri(j0)Rj(3i,s) ,

E1 i,j

(6.79)

and

4

Q(03,0) =

Q0(3,0,s)

s=2

As above, it is easy to see that for s

where

E2

Q0(3,0,s) ..= η

dy χ Ri(j3,s)Rj(0i) .

E1 i,j

3 we have

(6.80)

which implies, using (6.74),

EF ′(xR)Q0(0,3,s)

N −1/3+Cεp−4q−1 ,

(6.81)

E F (xS ) − F (xR) − B

EF ′(xR)Q(00,3,2) + EF ′(xR)Q(03,0,2) + N −5/3+Cεq−1 .

(6.82)

38

By symmetry, it only remains to prove that

EF ′(xR)Q(03,0,2)

N −1/3+Cεp−4q−1 .

(6.83)

Using the deﬁnition (6.79) and the estimate (6.44) to replace some diagonal resolvent matrix elements with msc, we ﬁnd

E2

EF ′(xR)Q(03,0,2) = η

dy EF ′(xR) χ RibhbdRddhdbRbbhbdRdj Rji + (b ↔ d)

E1 i,j

E2

=η

dy EF ′(xR) χ m2scRibRdj Rji Ebd|hbd|2hbd (b ↔ d)

E1 i,j

+ O(N −1/3+Cεp−4q−1) ,

(6.84)

where we used the estimate Ebd|hbd|2hbd

C Nq

to

control

the

errors

for

the

replacement.

Combining

(6.84)

with (6.74) and (6.63), we therefore obtain

E[F (xS ) − F (xR)] − B

CN −1/3+Cεp−4q−1 + Cq−1N −1/3+Cε

× max max χ EF ′(xR)Rij RjbRdi + EF ′(xR)RibRdjRji + (b ↔ d) , (6.85)
y∈Iε i,j

where we used the trivial bounds on F ′ and |msc|, and that every estimate is uniform in y. In order to complete the proof of Lemma 6.7, we need to estimate the expectations in (6.85) by a better
bound than the naive high-probability bound on the argument of E. This is accomplished in Lemma 6.8 below. From Lemma 6.8 and (6.85) we get in the case b = d that

E[F (xS ) − F (xR)] − B

N −1/3+Cεp−4q−1 ,

(6.86)

where B was deﬁned in (6.75). This completes the proof of Lemma 6.7.

Lemma 6.8. Under the assumptions of Lemma 6.7, in particular ﬁxing f = f∗, and assuming that a, b, i, j

are all distinct, we have

max
y∈Iε

EF ′(xR)RibRdj Rji(y

+

iη)

Cp−4 .

(6.87)

The same estimate holds for the other three terms on the right-hand side of (6.85).

In order to prove Lemma 6.8, we shall need the following result, which is crucial when estimating terms

arising from the nonvanishing expectation of Eaij = f N −1. Before stating it, we introduce some notation.

Recall that we set A ≡ Aγ−1 = (aij ), where the matrix entries are given by aij = hij + f /N and

Ehij = (A(b) )ij

0. We denote by A(b) the ..= 1(i = b)1(j = b)aij.

matrix obtained If Z ≡ Z(A) is

from A by a function

setting all entries with index b to of A, we deﬁne Z(b) ..= Z(A(b)).

zero, i.e. See also

Deﬁnitions 5.2 and 3.3 in [11]. We also use the notation Eb to denote partial expectation with respect to all

variables (a1b, . . . , aNb) in the b-th column of A.

Lemma 6.9. For any ﬁxed i we have, with (ξ, ν)-high probability,

1 N

Si(lk)hlk

k=i l=k

p−2 .

39

Proof. The claim is an immediate consequence of Proposition 7.11 in [11] and the observation that, for E ∈ Iε, η = N −2/3−ε, and q N φ we have

(log N )C

1 q

+

Im msc Nη

+

1 Nη

p−1

for large enough N .

Another ingredient necessary for the proof of Lemma 6.8 is the following resolvent identity.

Lemma 6.10. Let A = (aij ) be a square matrix and set S = (Sij ) = (A − z)−1. Then for i = j we have

Sij = −Sii

aikSk(ij) ,

k=i

Sij = −Sjj

Si(kj)akj .

k=j

(6.88)

Proof. We prove the ﬁrst identity in (6.88); the second one is proved analogously. We use the resolvent

identity

Sij

=

Si(jk)

+

Sik Skj Skk

for i, j = k

(6.89)

from [11], (3.8). Without loss of generality we assume that z = 0. Then (6.89) and the identity AS = ½

yield

aik Sk(ij)
k=i

=

k=i

aik Skj

−

k=i

aik

SkiSij Sii

=

−aiiSij

−

Sij Sii

(1

−

aiiSii)

=

Sij Sii

.

Armed with Lemmas 6.9 and 6.10, we may now prove Lemma 6.8.

Proof of Lemma 6.8. With the relation between R and S in (6.45) and (6.59), we ﬁnd that (6.87) is

implied by

EF ′(xS )SibSdjSji

Cp−4N Cε,

(6.90)

under the assumption that b, d, i, j are all distinct. This replacement is only a technical convenience when we apply a large deviation estimate below.
Recalling the deﬁnition of Ω after (6.55), we get using (6.89)

|Sij − Si(jb)| = SibSbj (Sbb)−1

Cp−2 in Ω .

This yields

|xS − (xS)(b)| p−1N Cε in Ω .

Similarly, we have

SibSdj Sji − SibSd(jb)Sj(ib)

Cp−4 in Ω .

Hence by assumption on F we have

(6.91) (6.92) (6.93)

|EF ′(xS )SibSdj Sji|

E F ′ (xS )(b) SibSd(jb)Sj(ib) + O p−4N Cε .

(6.94)

Since (xS )(b) and Sd(jb)Sj(ib) are independent of the b-th row of A, we ﬁnd from (6.94) that (6.90), and hence

(6.87), is proved if we can show that

EbSib = O(p−2)

(6.95)

40

for any ﬁxed i and b. What remains therefore is to prove (6.95). Using (6.55) and (6.91) we ﬁnd in Ω that

Sbb = msc + O(p−1) , Using akb = hkb + f /N we write

Si(kb) = O(p−1) .

(6.96)

Sib = −msc

Si(kb)

hkb

+

f N

k=b

− (Sbb − msc)

Si(kb)

hkb

+

f N

k=b

.

(6.97)

By (6.96) and the large deviation estimate (3.15) in [11], the second sum in (6.97) is bounded by O(p−1) with (ξ, ν)-high probability. Therefore, using (6.96) and Ebhkb = 0, we get

Eb Sib

=

−mscf N

Si(kb) + O(p−2)

=

−mscf N

Sik + O(p−2) ,

k=b

k=i

(6.98)

where in the second step we used (6.89). In order to estimate the right-hand side of (6.98), we introduce the quantity

X

..=

1 N

EkSik .

k=i

Note that X depends on the index i, which is omitted from the notation as it is ﬁxed. Using (6.88), (6.96), and (6.89) as above, we ﬁnd with (ξ, ν)-high probability

X

=

−msc N

Ek Si(lk)

hlk

+

f N

k=i l=k

+ O(p−2)

=

−mscf N2

Si(lk) + O(p−2)

k=i l=k

=

−mscf N

Sil + O(p−2)

l=i

= −mscf X + O

1 N

Sil − ElSil

l=i

+ O(p−2) .

Now recall that the spectral parameter z = E + iη satisﬁes E ∈ Iε (see (6.27)) and η = N −2/3−ε. Therefore (3.6) implies that msc(z) = −1+o(1). Recalling that f = 1+ε0, we therefore get, with (ξ, ν)-high probability,

X =O

1 N

Sil − ElSil

l=i

+ O(p−2) .

(6.99)

We now return to (6.98), and estimate, with (ξ, ν)-high probability

Eb Sib

=

−mscf N

Sik + O(q−2) = −mscf X + O

1 N

Sik − EkSik

k=i k=i

+ O(p−2) .

41

Together with (6.99) this yields, with (ξ, ν)-high probability,

EbSib = O

1 N

Sik − EkSik

k=i

+ O(p−2) .

(6.100)

In order to estimate the quantity in parentheses, we abbreviate IEkZ ..= Z − EkZ for any random variable Z and write, using (6.88),

1 N

Sik − EkSik

k=i

=

−1 N

IEk SkkSi(lk)alk

k=i l=k

=

−msc N

IEk Si(lk)

hlk

+

f N

k=i l=k

−

1 N

IEk(Skk − msc)

Si(lk)

hlk

+

f N

k=i l=k

.

Using the large deviation estimate (3.15) in [11], (6.96), and the bound |hlk| p−1 which holds with (ξ, ν)high probability (see Lemma 3.7 in [11]), we ﬁnd that the second term is bounded by O(p−2) with (ξ, ν)-high probability. Thus we get

1 N

Sik − EkSik

k=i

=

−msc N

Si(lk)hlk + O(p−2)

k=i l=k

with (ξ, ν)-high probability. Therefore (6.100) and Lemma 6.9 imply (6.95), and the proof is complete.

7. Universality of generalized Wigner matrices with finite moments

This section is an application of our results to the problem of universality of generalized Wigner matrices (see Deﬁnition 7.1 below) whose entries have heavy tails. We prove the bulk universality of generalized Wigner matrices under the assumption that the matrix entries have a ﬁnite m-th moment for some m > 4. We also prove the edge universality of Wigner matrices under the assumption that m > 12. (This lower bound can in fact be improved to m 7; see Remark 7.5 below.) The Tracy-Widom law for the largest eigenvalue of Wigner matrices was ﬁrst proved in [33] under a Gaussian decay assumption, and was proved later in [29, 35, 19, 24] under various weaker restrictions on the distributions of the matrix elements. In particular, in [24] the Tracy-Widom law was proved for entries with symmetric distribution and m > 12. In [23] similar results were derived for complex Hermitian Gaussian divisible matrices, where the GUE component is of order one. For this case it is proved in [23] that bulk universality holds provided the entries of the Wigner component have ﬁnite second moments, and edge universality holds provided they have ﬁnite fourth moments.

Definition 7.1. We call a Hermitian or real symmetric random matrix H = (hij) a generalized Wigner matrix if the two following conditions hold. First, the family of upper-triangluar entries (hij : i j) is

independent. Second, we have

Ehij = 0 ,

E|hij |2 = σi2j ,

where the variances σi2j satisfy

σi2j = 1 ,
j

C−

inf
i,j

(N

σi2j

)

sup(N σi2j )
i,j

C+ ,

42

and 0 < C− C+ < ∞ are constants independent of N .

Theorem 7.2 (Bulk universality). Suppose that H = (hij) satisﬁes Deﬁnition 7.1. Let m > 4 and

assume that for all i and j we have

E hij /σij m

Cm ,

(7.1)

for some constant Cm, independent of i, j, and N . Let n ∈ N and O : Rn → R be compactly supported and continuous. Let E satisfy −2 < E < 2 and let
ε > 0. Then for any sequence bN satisfying N −1+ε bN ||E| − 2|/2 we have

lim
N →∞

E+bN dE′ E−bN 2bN

dα1 · · · dαn O(α1, . . . , αn)

×

1 ̺sc(E)n

p(Nn) − p(Gn,)N

E′

+

N

α1 ̺sc(E

)

,

.

.

.

,

E′

+

αn N ̺sc(E)

= 0.

Here ̺sc was deﬁned in (2.10), p(Nn) is the n-point marginal of the eigenvalue distribution of H, and p(Gn,)N the n-point marginal of an N × N GUE/GOE matrix.

Theorem 7.3 (Edge universality). Suppose that Hv = (hvij ) and Hw = (hwij ) both satisfy Deﬁnition 7.1. Assume that the entries of Hv and Hw all satisfy (7.1) for some m > 12, and that the two ﬁrst two moments of the entries of hvij and hwij match:

Ev(hvij )l(hvij )u = Ew(hwij )l(hwij )u

for 0 l + u 2 .

Then there is a δ > 0 such that for any s ∈ R we have

Pv N 2/3(λN − 2) s − N −δ − N −δ Pw N 2/3(λN − 2) s

Pv N 2/3(λN − 2) s + N −δ + N −δ .

(7.2)
Here Pv and Pw denote the laws of the ensembles Hv and Hw respectively, and λN denotes the largest eigenvalue of Hv or Hw.

Remark 7.4. A similar result holds for the smallest eigenvalue λ1. Moreover, a result analogous to (7.2) holds for the n-point joint distribution functions of the extreme eigenvalues. (See [19], Equation (2.40)).
Remark 7.5. With some additional eﬀort, one may in fact improve the condition m > 12 in Theorem 7.3 to m 7. The basic idea is to match seven instead of four moments in Lemma 7.7, and to use the resolvent expansion method from Section 6.3. We omit further details.
The rest of this section is devoted to the proof of Theorems 7.2 and 7.3.

7.1. Truncation. For deﬁniteness, we focus on real symmetric matrices, but the following truncation argu-
ment applies trivially to complex Hermitian matrices by truncating the real and imaginary parts separately. To simplify the presentation, we consider Wigner matrices for which σij = N −1/2. The proof for the more general matrices from Deﬁnition 7.1 is the same; see also Remark 2.4 in [11].
We begin by noting that, without loss of generality, we may assume that the distributions of the entries
of H are absolutely continuous. Otherwise consider the matrix H + εN V , where V is a GUE/GOE matrix independent of H, and (εN ) is a positive sequence that tends to zero arbitrarily fast. (Note that the following argument is insensitive to the size of εN .)

43

Let assume

H≡ that

Hx the

=fam(hixilyj )(bxeija..

Wigner matrix whose i j) is independent,

entries are of the form hxij and that each xij satisﬁes

=

N −1/2xij

for

some

xij .

We

Exij = 0 , E|xij |2 = 1 .

Moreover, we assume that there is an m > 4 and a constant Cm 1, independent of i, j, and N , such that E|xij |m Cm .
In a ﬁrst step we construct a truncated Wigner matrix Hy from Hx. This truncation is performed in the following lemma. Lemma 7.6. Fix m > 2 and let X be a real random variable, with absolutely continuous law, satisfying
EX = 0 , EX2 = 1 , E|X|m Cm .

Let λ > 0. Then there exists a real random variable Y that satisﬁes EY = 0 , EY 2 = 1 , |Y | λ , P(X = Y )

2Cmλ−m .

Proof. We introduce the abbreviations P ..= P(|X| > λ) , E ..= E X 1(|X| > λ) ,

V ..= E X2 1(|X| > λ) .

Using the assumption on X, Markov’s inequality, and Ho¨lder’s inequality, we ﬁnd

P Cmλ−m , |E| Cmλ−m+1 , V Cmλ−m+2 .

(7.3)

The idea behind the construction of Y is to cut out the tail |X| > λ, to add appropriate Dirac weights at ±λ, and to adjust the total probability by cutting out the values X ∈ [−a, a], where a is an appropriately
chosen small nonnegative number. For any t satisfying 0 t 1/2, we choose a nonnegative number at such that P(X ∈ [−at, at]) = t. Note that since X is absolutely continuous such a number at exists and the map t → at is continuous. Moreover, using EX2 = 1 and Markov’s inequality we ﬁnd that at 2 for t 1/2. We deﬁne the quantities

et ..= E X1(−at X at) , vt ..= E X21(−at X at) ,

which satisfy the trivial bounds

|et| 2t , vt 4t .

(7.4)

We shall remove the values (−∞, −λ) ∪ [−at, at] ∪ (λ, ∞) from the range of X, and replace them with Dirac weights at λ and −λ with respective probabilities p and q. Thus we are led to the system

p + q = P + t , p − q = λ−1(E + et) , p + q = λ−2(V + vt) .

(7.5)

In order to solve (7.5), we abbreviate the right-hand sides of the equations in (7.5) by α(t), β(t), and γ(t)
respectively.
In a ﬁrst step, we solve t from the equation α(t) = γ(t). To that end, we observe that α(0) γ(0), as follows from the trivial inequality V λ2P . Moreover, α(1/2) ≫ γ(1/2), by (7.3) and (7.4). Since α(t)

44

and t0

γ(t) are Cmλ−m

continuous, the + 4λ−2t0, from

equation which we

α(t) = γ(t) get that t0

has a solution t0. Moreover, 2Cmλ−m. For the following

(7.3) and (7.4) we ﬁx t ..= t0.

imply

that

In a second step, we solve the equations p + q = α(t) and p − q = β(t) to get

p

=

α(t)

+ 2

β(t)

,

q

=

α(t)

− 2

β(t)

.

We now claim that |β(t)| α(t). Indeed, a simple application of Cauchy-Schwarz yields |β(t)| γ(t))/2 = α(t). Hence p and q are nonnegative. Moreover, the bounds (7.3) and (7.4) yield

(α(t) +

p + q 2Cmλ−m .

Thus we have proved that (7.5) has a solution (p, q, t) satisfying

0 p, q, t 2Cmλ−m .
Next, let I ..= (−∞, −λ) ∪ [−at, at] ∪ (λ, ∞). Thus, P(X ∈ I) = p + q. Partition I = I1 ∪ I2 such that P(X ∈ I1) = p and P(X ∈ I2) = q. Then we deﬁne
Y ..= X1(X ∈/ I) + λ1(X ∈ I1) − λ1(X ∈ I2) .

Recalling (7.5), we ﬁnd that Y satisﬁes EY = 0 and EY 2 = 1. Moreover,

P(X = Y ) = P(X ∈ I) = p + q 2Cmλ−m .

This concludes the proof.

Note that the satisfying 0 < ρ < a random variable

variable 1/2 and yij such

Y constructed assume that m

in >

Lemma 7.6 satisﬁes 4. Using Lemma 7.6

E|Y |m with λ

..=3NCρm,

that the family (yij .. i j) is independent and

. Let ρ be some we construct, for

exponent each xij ,

Eyij = 0 ,

Eyi2j = 1 ,

|yij | N ρ ,

P(xij = yij ) 2CmN −ρm ,

E|yij |m 3Cm . (7.6)

We deﬁne the new matrix Hy = (hyij) through hyij ..= N −1/2yij. In particular, we have

Ehyij = 0 ,

E|hyij |2

=

1 N

,

E|hyij |p

1 N qp−2

,

(7.7)

where we set Thus, Hy satisﬁes Deﬁnition 2.1.

q ..= N 1/2−ρ .

(7.8)

7.2. Moment matching. Next, we construct a third Wigner matrix, Hz = (hzij), whose entries are of the form hzij = N −1/2zij. We require that zij have uniformly subexponential decay, i.e.

Ezij = 0 ,

E|zij |2 = 1 ,

P(|zij | ξ) θ−1e−ξθ ,

(7.9)

for some θ > 0 independent of i, j, and N . We choose zij so as to match the ﬁrst four moments of yij.

45

Lemma 7.7. Let yij satisfy (7.6) with some m > 4. Then there exists a zij satisfying (7.9) such that Ezilj = Eyilj for l = 1, . . . , 4.
Proof. In fact, using an explicit construction similar to the one used in the proof of Theorem 2.5, zij can be chosen to be supported at only three points. We omit further details.

It was proved in [19], Section 2.1, that the statement of Theorem 7.2 holds if the entries of H satisfy the
subexponential decay condition (7.9). Theorem 7.2 will therefore follow if we can prove that, for bN and O as in Theorem 7.2, we have

E+bN dE′

lim
N →∞

E−bN

2bN

dα1 · · · dαn O(α1, . . . , αn)

×

1 ̺sc(E)n

p(xn,N) − p(zn,N)

E′

+

N

α1 ̺sc(E

)

,

.

.

.,

E′

+

αn N ̺sc(E)

= 0 , (7.10)

where px(n,N) and pz(n,N) are the n-point marginals of the eigenvalue distributions of Hx and Hz, respectively. Similarly, it was proved in [19], Section 2.2, that the statement of Theorem 7.3 holds if the entries of
Hv and Hw both satisfy the subexponential decay condition (7.9). Thus, Theorem 7.3 will follow if we can
prove that

Px N 2/3(λN − 2) s − N −δ − N −δ Pz N 2/3(λN − 2) s

Px N 2/3(λN − 2) s + N −δ + N −δ ,

(7.11)

for some δ > 0. Here we use Px and Pz to denote the laws of ensembles Hx and Hz respectively. We shall prove both (7.10) and (7.11) by ﬁrst comparing Hx to Hy and then comparing Hy to Hz. The

ﬁrst step is easy: from (7.6) we get

P(Hx = Hy) 2CmN 2−ρm .

(7.12)

Thus, (7.10) and (7.11) hold with z replaced by y provided that

ρm > 2 .

(7.13)

7.3. Comparison of Hy and Hz in the bulk. In this section we prove that (7.10) holds with x replaced by
y, and hence complete the proof of Theorem 7.3. We compare the local spectral statistics of Hy and Hz using the Green function comparison method
from [17], Section 8. The key additional input is the local semicircle theorem for sparse matrices, Theorem 3.3. We merely sketch the diﬀerences to [17]. As explained in [17], the n-point correlation functions p(Nn) can be expressed (up to an error N −c) in terms of expectations of observables F , whose arguments are products of expressions of the form m(zi + iη) where η ..= N −1−ε. We assume that the ﬁrst ﬁve derivatives of F are polynomially bounded, uniformly in N . Using the local semicircle law for sparse matrices, Theorem 3.3, we may control the Green function matrix elements down to scales N −1−ε, uniformly in E. (Note that in [17],
the spectral edge had to be excluded since the bounds derived there were unstable near the edge, unlike our
bound (3.14).) This allows us to compare the local eigenvalue statistics of the matrix ensembles at scales N −1−ε, which is suﬃciently accurate for both Theorems 7.2 and 7.3.
We use the telescopic summation and the Lindeberg replacement argument from [17], Chapter 8, whose
notations we take over without further comment; see also Section 6.3. A resolvent expansion yields

S = R − N −1/2RV R + N −1(RV )2R − N −3/2(RV )3R + N −2(RV )4R − N −5/2(RV )5S .

46

Next, note that, by (7.7) and (7.9), both ensembles Hy and Hz satisfy Deﬁnition 2.1 with q deﬁned in
(7.8). Hence we may invoke Theorem 3.3 with f = 0, and in particular (3.14), for the matrices Hy and Hz. Choosing η = N −1−ε for some ε > 0, we therefore get

|Rij (E + iη)| N 2ε , |Sij (E + iη)| N 2ε

with (ξ, ν)-high probability. Consider now the diﬀerence Ey − Ez applied to some ﬁxed observable F depending on traces (normalized
by N −1) of resolvents and whose derivatives have at most polynomial growth. Since the ﬁrst four moments of the entries of Hy and Hz coincide by Lemma 7.7, the error in one step of the telescopic summation is
bounded by the expectation of the rest term in the resolvent expansion, i.e.

N CεN −5/2E(RV )5S

N

−5/2+C ε

max
a,b

E|Vab

|5

N −5/2+CεCmN ρ ,

where in the last step we used (7.6). The ﬁrst factor N Cε comes from the polynomially bounded derivatives

of F . Summing up all O(N 2) terms of the telescopic sum, we ﬁnd that the diﬀerence Ey − Ez applied to F

is bounded by

N −1/2+Cε+ρ .

(7.14)

Combining (7.14) and (7.12), we ﬁnd that both (7.10) follows provided that

−

1 2

+

Cε

+

ρ

<

0,

ρm > 2 .

(7.15)

Since m > 4 is ﬁxed, choosing ﬁrst 1/2 − ρ small enough and then ε small enough yields (7.15). This completes the proof of Theorem 7.2.

7.4. Comparison of Hy and Hz at the edge. In order to prove (7.2) under the assumption m > 12, we may invoke Proposition 6.4, which implies that (7.11) holds with x replaced by y, provided that φ = 1/2−ρ > 1/3, i.e. ρ < 1/6. Together with the condition (7.13), this implies that (7.2) holds if m > 12.

A. Regularization of the Dyson Brownian motion
In this appendix we sketch a simple regularization argument needed to prove two results concerning the Dyson Brownian motion (DBM). This argument can be used as a substitute for earlier, more involved, proofs given in Appendices A and B of [16] on the existence of the dynamics restricted to the subdomain ΣN ..= {x .. x1 < x2 < · · · < xN }, and on the applicability of the Bakry-Emery method. The argument presented in this section is also more probabilistic in nature than the earlier proofs of [16].
For applications in Section 4 of this paper, some minor adjustments to the argument below are needed to incorporate the separate treatment of the largest eigenvalue. These modiﬁcations are straightforward, and we shall only sketch the argument for the standard DBM.
Theorem A.1. Fix n 1 and let m = (m1, . . . , mn) ∈ Nn be an increasing family of indices. Let G .. Rn → R be a continuous function of compact support and set
Gi,m(x) ..= G N (xi − xi+m1 ), N (xi+m1 − xi+m2 ), . . . , N (xi+mn−1 − xi+mn ) .

47

Let γ1, . . . , γN denote the classical locations of the eigenvalues and set

N
Q ..= sup
t∈[t0,τ ] i=1

(xi − γi)2ft dµ

(A.1)

Choose an ε > 0. Then for any ρ satisfying 0 < ρ < 1, and setting τ = N −ρ, there exists a τ¯ ∈ [τ /2, τ ] such that, for any J ⊂ {1, 2, . . . , N − mn − 1}, we have

1 |J |

Gi,m fτ¯ dµ −

i∈J

1 |J |

Gi,m dµ

i∈J

CNε

NQ + 1 |J |τ

(A.2)

for all N N0(ρ). Here µ = µ(N) is the equilibrium measure of the N eigenvalues of the GOE.

Deﬁne µβ(x) = Ce−NβH(x) as in (4.6) and (4.5), but introducing a parameter β so that µβ is the equilibrium measure of the usual β-ensemble which is invariant under the (β-dependent) DBM. We remark that Theorem A.1 holds for all β 1 with an identical proof. The following lemma holds more generally for β > 0.
Let ω ..= Cµβe−N ,j Uj(xj) where Uj is a C2-function satisfying

min
j

Uj′′

(x)

τ −1

(A.3)

for some τ < 1. For the following lemma we recall the deﬁnition (4.7) of the Dirichlet form.

Lemma A.2. Let β > 0 and q ∈ H1(dω) be a probability density with respect to ω. Then for any β > 0 and any J ⊂ {1, 2, . . . , N − mn − 1} and any t > 0 we have

1 |J |

Gi,m q dω −

i∈J

1 |J |

Gi,m dω

i∈J

C

Dω (√q) |J |

t

+

C

Sω(q) e−ct/τ .

(A.4)

Recall that the DBM is deﬁned via the stochastic diﬀerential equation

dxi

=

√dBi + β N

−

1 4

xi

+

1 2N

j=i

1 xi − xj

dt

for i = 1, . . . , N ,

(A.5)

where B1, . . . , BN is a family of independent standard Brownian motions. It was proved in [1], Lemma 4.3.3,
that there is a unique strong solution to (A.5) for all β 1. For any δ > 0 deﬁne the extension µδβ of the measure µβ from ΣN to RN by replacing the singular
logarithm with a C2-function. To that end, we introduce the approximation parameter δ > 0 and deﬁne, as in Section 4, for x ∈ RN ,

Hδ(x) ..=

1 4

x2i

−

1 N

logδ(xj − xi)

i i<j

where we set

logδ(x) ..= 1(x

δ) log x + 1(x < δ)

log

δ

+

x

− δ

δ

−

1 2δ2

(x

−

δ)2

.

48

It is easy to check that logδ ∈ C2(R), is concave, and satisﬁes

lim
δ→0

logδ

(x)

=

log x if x > 0 −∞ if x 0 .

Furthermore, we have the lower bound

∂x2 logδ(x)

−

1 x2

if x > δ

−

1 δ2

if x

δ.

Similarly, we can extend the measure ω to RN by setting ωδ = Ce−N j Uj(xj)µδβ.

Lemma A.3. Let q ∈ L∞(dωδ) be a C2 probability density. Then for δ {1, 2, . . . , N − mn − 1} and any t > 0 we have

1/N , β > 0 and any J ⊂

1 |J |

Gi,m q dωδ −

i∈J

1 |J |

Gi,m dωδ

i∈J

C

Dωδ (√q) t |J |

+

C

Sωδ (q) e−ct/τ .

(A.6)

Proof. The proof of Theorem 4.3 in [16] applies with merely cosmetic changes; now however the dynamics
is deﬁned on RN instead of ΣN , so that complications arising from the boundary are absent. The condition δ 1/N is needed since we use the singularity of ∂x2 log x to generate a factor 1/N 2 in the regime x C/N in the proof.

Proof of Lemma A.2. Suppose that q is a probability density in ΣN with respect to ω. We extend q to

be zero there is

outside Σ a constant

and Cε,δ

let qε such

∈ C2 be any regularization of q that qεδ ..= Cε,δ qε is a probability

on RN density

that with

converges respect to

to q ωδ .

in H1(ω). Thus (A.6)

Then holds

with q replaced by qεδ. Taking the limit δ → 0 and then ε → 0, we have

1 |J |

Gi,m q dω −

i∈J

1 |J |

Gi,m dω

i∈J

C lim lim
ε→0 δ→0

Dωδ ( qεδ) |J |

t

+

C

lim
ε→0

lim
δ→0

Sωδ (qεδ) e−ct/τ . (A.7)

Notice that ωδ → 1(ΣN )ω weakly as δ → 0. Thus

lim
ε→0

lim
δ→0

Dωδ

(

qεδ )

=

lim
ε→0

Dω

(√qε

)

=

Dω (√q)

provided that q ∈ H1(ω). This proves Lemma A.2. Notice that the proof did not use the existence of DBM; instead, it used the existence of the regularized DBM.

Proof of Theorem A.1. Write

1 |J |

Gi,m ft dµ

=

Ef0µEx0 1 |J |

Gi,m(x(t)) .

i∈J i∈J

49

Here Ex0 denotes expectation with respect to the law of the DBM (x(t))t starting from x0, and Ef0µ denotes

expectation of x0 with respect to the measure f0µ. Let Eδ denote expectation with respect to the regularized

DBM. Then we have

Ex0 1 |J |

Gi,m(x(t))

=

lim
δ→0

Exδ 0

1 |J

|

Gi,m(x(t)),

i∈J i∈J

where we have used the existence of a strong solution to the DBM (see [1], Lemma 4.3.3) and that the dynamics remains in ΣN almost surely. Hence

1 |J |

Gi,m ft dµ = lim
δ→0

i∈J

1 |J |

Gi,m ftδ dµδ,

i∈J

where ftδ is the solution to the regularized DBM at the time t with initial data f0µ/µδ. Using that (A.2) holds for the regularized dynamics, and taking the limit δ → 0, we complete the proof.

References
[1] Anderson, G., Guionnet, A., Zeitouni, O.: An Introduction to Random Matrices. Studies in advanced mathematics, 118, Cambridge University Press, 2009.
[2] Aptekarev, A., Khabibullin, R.: Asymptotic expansions for polynomials orthogonal with respect to a complex non-constant weight function, Trans. Moscow Math. Soc. 68, 1–37 (2007).
[3] Auﬃnger, A., Ben Arous, G., P´ech´e, S.: Poisson Convergence for the largest eigenvalues of heavytailed matrices. Ann. Inst. Henri Poincar´e Probab. Stat. 45, no. 3, 589–610 (2009).
[4] Biroli, G., Bouchaud, J.-P., Potters, M.: On the top eigenvalue of heavy-tailed random matrices. Europhysics Lett. 78, 10001 (2007).
[5] Bleher, P., Its, A.: Semiclassical asymptotics of orthogonal polynomials, Riemann-Hilbert problem, and universality in the matrix model. Ann. of Math. 150, 185–266 (1999).
[6] Brascamp, H. J. and Lieb, E. H.: On extensions of the Brunn-Minkowski and Pr´ekopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diﬀusion equation. J. Funct. Anal. 22, 366–389 (1976).
[7] Deift, P., Kriecherbauer, T., McLaughlin, K.T-R, Venakides, S., Zhou, X.: Uniform asymptotics for polynomials orthogonal with respect to varying exponential weights and applications to universality questions in random matrix theory. Comm. Pure Appl. Math. 52, 1335–1425 (1999).
[8] Deift, P., Kriecherbauer, T., McLaughlin, K.T-R, Venakides, S., Zhou, X.: Strong asymptotics of orthogonal polynomials with respect to exponential weights. Comm. Pure Appl. Math. 52, 1491– 1552 (1999).
[9] Dyson, F.J.: A Brownian-motion model for the eigenvalues of a random matrix. J. Math. Phys. 3, 1191–1198 (1962).

50

[10] Erdo˝s, L.: Universality of Wigner random matrices: a Survey of Recent Results (lecture notes). Preprint arXiv:1004.0861.
[11] Erdo˝s, L., Knowles, A., Yau, H.-T., Yin, J.: Spectral statistics of sparse random matrices I: local semicircle law. Preprint arXiv:1103.1919, to appear in Ann. Prob.
[12] Erdo˝s, L., Schlein, B., Yau, H.-T.: Semicircle law on short scales and delocalization of eigenvectors for Wigner random matrices. Ann. Probab. 37, No. 3, 815–852 (2009).
[13] Erdo˝s, L., Schlein, B., Yau, H.-T.: Local semicircle law and complete delocalization for Wigner random matrices. Commun. Math. Phys. 287, 641–655 (2009).
[14] Erdo˝s, L., Schlein, B., Yau, H.-T.: Wegner estimate and level repulsion for Wigner random matrices. Int. Math. Res. Notices. 2010, No. 3, 436-479 (2010).
[15] Erdo˝s, L., Schlein, B., Yau, H.-T.: Universality of random matrices and local relaxation ﬂow. Preprint arXiv:0907.5605.
[16] Erdo˝s, L., Schlein, B., Yau, H.-T., Yin, J.: The local relaxation ﬂow approach to universality of the local statistics for random matrices. To appear in Ann. Inst. H. Poincar´e Probab. Statist. Preprint arXiv:0911.3687.
[17] Erdo˝s, L., Yau, H.-T., Yin, J.: Bulk universality for generalized Wigner matrices. Preprint arXiv:1001.3453.
[18] Erdo˝s, L., Yau, H.-T., Yin, J.: Universality for generalized Wigner matrices with Bernoulli distribution. To appear in J. Combinatorics. Preprint arXiv:1003.3813.
[19] Erdo˝s, L., Yau, H.-T., Yin, J.: Rigidity of eigenvalues of generalized Wigner matrices. Preprint arXiv:1007.4652.
[20] Erdo˝s, P.; R´enyi, A.: On random graphs. I. Publicationes Mathematicae 6, 290–297 (1959).
[21] Erdo˝s, P.; R´enyi, A.: The evolution of random graphs. Magyar Tud. Akad. Mat. Kutato´ Int. K¨ozl. 5: 17–61 (1960).
[22] Johansson, K.: Universality of the local spacing distribution in certain ensembles of Hermitian Wigner matrices. Comm. Math. Phys. 215, no. 3., 683–705 (2001).
[23] Johansson, K.: Universality for certain Hermitian Wigner matrices under weak moment conditions. Ann. Inst. H. Poincar´e Probab. Statist. 48, 47–79 (2012).
[24] Khorunzhi, O.: High moments of large Wigner random matrices and asymptotic properties of the spectral norm. To appear in Rand. Op. Stoch. Eqs.
[25] Knowles, A., Yin, J.: Eigenvector distribution of Wigner matrices. Preprint arXiv:1102.0057, to appear in Prob. Theor. Rel. Fields.
[26] Miller, S. J., Novikoﬀ, T., Sabelli, A.: The distribution of the largest nontrivial eigenvalues in families of random regular graphs. Exper. Math. 17, 231–244 (2008).
51

[27] Pastur, L., Shcherbina M.: Universality of the local eigenvalue statistics for a class of unitary invariant random matrix ensembles. J. Stat. Phys. 86, 109–147 (1997).
[28] Pastur, L., Shcherbina M.: Bulk universality and related properties of Hermitian matrix models. J. Stat. Phys. 130, 205–250 (2008).
[29] Ruzmaikina, A.: Universality of the edge distribution of eigenvalues of Wigner random matrices with polynomially decaying distributions of entries, Comm. Math. Phys. 261, no. 2, 277–296 (2006).
[30] Sarnak, P.: Private communication. [31] Sinai, Y. and Soshnikov, A.: A reﬁnement of Wigner’s semicircle law in a neighborhood of the
spectrum edge. Functional Anal. and Appl. 32, no. 2, 114–131 (1998). [32] Sodin, S.: The Tracy–Widom law for some sparse random matrices. Preprint arXiv:0903.4295. [33] Soshnikov, A.: Universality at the edge of the spectrum in Wigner random matrices. Comm. Math.
Phys. 207, no. 3, 697–733 (1999). [34] Tao, T. and Vu, V.: Random matrices: Universality of the local eigenvalue statistics, to appear in
Acta Math., Preprint arXiv:0906.0510. [35] Tao, T. and Vu, V.: Random matrices: Universality of local eigenvalue statistics up to the edge.
Preprint arXiv:0908.1982. [36] Tracy, C., Widom, H.: Level-Spacing Distributions and the Airy Kernel, Comm. Math. Phys. 159,
151–174 (1994). [37] Tracy, C., Widom, H.: On orthogonal and symplectic matrix ensembles, Comm. Math. Phys. 177,
no. 3, 727–754 (1996).
52