A Measure of Segregation Based on Social Interactions

We develop an index of segregation based on two premises: (1) a measure of segregation should disaggregate to the level of individuals, and (2) an individual is more segregated the more segregated are the agents with whom she interacts. We present an index that satisﬁes (1) and (2) and that is based on agents’ social interactions: the extent to which blacks interact with blacks, whites with whites, etc. We use the index to measure school and residential segregation. Using detailed data on friendship networks, we calculate levels of within-school racial segregation in a sample of U. S. schools. We also calculate residential segregation across major U. S. cities, using block-level data from the 2000 U. S. Census.


I. INTRODUCTION
Ethnic and racial segregation is an important and well-studied social phenomenon.For over 50 years, social scientists have been concerned with measuring the extent and estimating the impact of segregation in education, housing, and the labor market.The result of this scholarship has been nearly 20 different indices of segregation and a consensus that the spatial separation of many minorities from jobs, role models, health care, and quality local public goods is a leading cause of racial and ethnic differences on many economic, social, and health related outcomes [Kain 1968;Case and Katz 1991;Massey and Denton 1993;Borjas 1995;Cutler and Glaeser 1997;Collins and Williams 1999;Almond, Chay, and Greenstone 2003].
We propose a new approach to measuring segregation based on two premises: (1) a measure of segregation should disaggregate to the level of individuals, and (2) an individual is more segregated the more segregated are the agents with whom she interacts.Having a measure of segregation with the flexibility to disaggregate to the level of individuals opens up windows of opportunity for empirical work, and a better understanding of the mechanisms by which social interactions affect economic and social outcomes.We also desire a measure that gives a larger level of segregation for individuals whose contacts are more segregated.Consider Figure I, which depicts the distribution of blacks across metropolitan Detroit, Michigan.There is a large oval in the center of the city containing almost exclusively black households.Any measure of segregation should report that the household in the epicenter is more segregated than a household close to the edge, even when each household has all black neighbors.
We use social networks-individuals and their connections-as our mathematical framework.In this framework, we propose three specific properties that any measure of segregation in a network should satisfy.We prove that one and only one index satisfies these properties and the two broad principles above, which we label as the "Spectral Segregation Index" (SSI).The properties require that: (a) [Monotonicity] if all individuals in Network A have a larger share of their interactions with agents of the same group than in Network B, then Network A is more segregated than B; (b) [Linearity] an individual is more segregated the more segregated are the agents with whom she interacts, and this relationship takes on a linear form; and (c) [Homogeneity] if all individuals in a network have half of their interactions with members of the same group, the index of segregation is one-half.The latter condition normalizes the index.
We defer a formal definition of the SSI to Section IV.Informally, the SSI measures the connectedness of individuals of the same group. 1Consider the following recursion.Define "first-order segregation" as the share of one's social interactions that are with individuals of their own group.Let "second-order segregation" be the average overall own-group social interactions of their first order segregation.Following this line, an agent's n th order segregation is the average over own group connections of their n Ϫ 1 order segregation and so on.The SSI of an individual is the limit, as n 3 ϱ, of that individual's n th order segregation.
The SSI has important advantages over existing measures of segregation.First, as a gauge of residential segregation, it is invariant to arbitrary partitions of a city; existing measures are not. 2 Second, it allows one to investigate how segregated multiple minority groups are permitting comparisons of Asians, Blacks, Hispanics, Native Americans, and so on, within and across cities. 3  The SSI makes it possible to compare Hispanic segregation across 1. Groups can be defined in terms of gender, political affiliation, educational attainment, race/ethnicity, and so on.Our empirical applications are to race/ ethnicity.
2. As a practical matter, we use the most disaggregated data publicly available: census blocks.
3. Another way to analyze multiple groups with existing indices is to calculate the weighted average of several dichotomous indices (see Reardon and Firebaugh [2002]).It is not clear how to interpret the findings from such an exercise.cities, compare the Hispanics of east Los Angeles from the Hispanics in south Los Angeles, or compare them to Blacks in Chicago.Third, our index allows one to analyze the full distribution of segregation, allowing researchers to move beyond aggregate statistics, which can be misleading.The typical Black household is more segregated than the typical Hispanic household, yet the most segregated Hispanics are orders of magnitude more segregated than any Blacks.Fourth, there are inherent multiplicative effects captured by SSI, which other indices omit.An individual's susceptibility to group-transmitted influences depends on how many contacts the individual has with members of the group, the susceptibility of her contacts, the susceptibility of their contacts, and so on.
The SSI has some disadvantages as well.It depends on the quality of the information one can obtain about social interactions.In the case of residential segregation, for example, the information is restricted to where individuals live within a city and not how they interact.Unlike other indices, however, as better information on the nature of social interactions is obtained, the SSI becomes a sharpened proxy of those interactions.Second, it is sensitive to the fraction of individuals in a network who have the race/ethnicity under study.We address this issue by calculating a "baseline," and adjusting actual SSI taking this into account.Finally, implementing the SSI can be computationally demanding, though our applications demonstrate that the computational tasks are often feasible. 4fter formally deriving the SSI, we apply the index to two well-known social phenomena: measuring the extent of school and residential segregation.We begin by measuring withinschool segregation patterns by race using data on friendship networks available in the National Adolescent Study of Health (Addhealth).Our analysis unearths a rich set of new facts.First, the relationship between the share of black students in a school and their segregation is nonlinear: When black students are relatively scarce in a school, their friendship networks tend to be integrated.As their share of the student population increases, segregation increases dramatically, plateauing when blacks comprise roughly 25 percent of the student population.Schools that have 25 percent or more black students exhibit severe within school racial segregation of social interactions.This phenomenon undermines the intuition that a school that has equal shares of black and white students is well integrated.A similar, though less pronounced, pattern exists among Asians and Hispanics, and is weaker still for Whites.The common practice of using the percentage of a racial group in a school as a proxy for within school segregation measures for that group is deeply problematic.
We also calculate the extent of segregation across major cities in the U. S., using block-level data from the 2000 Census.We find that, on average, Blacks are more segregated than any other racial group, but the most segregated Hispanics are more segregated than the most segregated Blacks.A virtue of the SSI is the ability to measure segregation at disaggregated levels, allowing one to measure the intensity of same-race clusters or uncover the most segregated city blocks in America.For example, we find that the largest minority ghetto in the U. S. consists of Hispanics in Los Angeles, CA-17,909 blocks are connected to each other.It is important to emphasize that these disaggregated results cannot be obtained with any of the existing measures of segregation.We also use SSI to correlate segregation with several MSA-level variables and replicate Cutler and Glaeser's [1997] classic work on ghettos.
We compare our results to existing calculations applying commonly-used measures.The rank correlation between the SSI and the popular dissimilarity index is .42.The rank correlation with the index of isolation is .93.Our index can be interpreted as a measure of segregation as isolation that is rooted in a socialinteractions framework.
The organization of the paper is as follows.Section II provides a brief discussion of existing segregation indices.Section III provides an example that previews our general results.Section IV derives the SSI.Section VI uses the SSI to estimate the prevalence of within-school and residential segregation.Section VII concludes.There are two appendices.Appendix A contains the technical proofs of all formal results and additional theoretical results omitted from the text.Appendix B presents a guide to the programs we used to compute our index.

II. BACKGROUND AND PREVIOUS LITERATURE
At an abstract level, segregation is the degree to which two or more groups are separated from each other.However, practical definitions can be quite distinct from one another, conceptually and empirically.Massey and Denton [1988] group existing indices into five classes: evenness, exposure, concentration, centralization, and clustering, which they take to resemble the totality of what is usually meant by "segregation."Evenness refers to the differential distribution of two groups across areas in a city.Measures of exposure are designed to approximate the amount of potential contact and interaction between members of different groups.Concentration indices measure the relative amount of physical space occupied by a minority group.Centralization is the extent to which a group is located near the center of an urban area, and clustering measures the degree to which geographic units inhabited by minority members about one another, or cluster spatially.Of the five dimensions of segregation, only two are used in the vast majority of applied work in the social sciences: evenness and exposure.Economists ultimately care about the degree to which segregation affects social interactions.For this purpose, concentration and centralization are inadequate, and measures of clustering are largely avoided because of their sensitivity to the number and population of census regions.
The most popular measure of segregation is the "dissimilarity" index (developed by Jahn, Schmid, and Schrag [1947]), a measure of evenness. 5Suppose a city is divided into N sections.The dissimilarity index measures the percentage of a group's population that would have to change sections for each section to have the same percentage of that group as the whole city.In symbols, (1) Index of dissimilarity ϭ 1 where Black i is the number of blacks in area i, Black total is the total number of blacks in the city as a whole, Non-Black i is the number of non-blacks in area i, and Non-Black total is the number of non-blacks in the city.The dissimilarity index has the appealing feature that it is invariant to the size of a minority group.
A second commonly-used measure of segregation is "isolation," a measure of exposure.As Blau [1977] recognized, Blacks 5. Other measures of evenness include the Gini coefficient (the mean absolute difference between minority proportions weighted across all pairs of geographic units, expressed as a proportion of the maximum weighted mean difference), the Atkinson index (similar to Gini coefficient, but allows researchers to decide how to weight geographic units which are over or under the city-wide distribution), and Entropy (the weighted average of each geographic unit's deviation from the racial entropy of the city as a whole).can be evenly distributed among residential areas in a city but experience little exposure to non-blacks if they are a relatively large proportion of the city.Isolation measures the extent to which blacks are exposed only to one other, rather than to nonblacks.The index is computed as the minority-weighted average of each section's minority population: where person i refers to the total population of area i. 6  Dissimilarity and isolation possess at least two undesirable properties.First, they explicitly depend on the arbitrary ways in which cities are partitioned into sections (e.g., census tracts). 7 That is, fixing the location of minorities and nonminorities in a city and redrawing the sections can drastically change the measure of segregation.An exaggerated example is depicted in Figure II.The city depicted in the figure has a dissimilarity index of 0 -perfect integration-when sections are drawn vertically and has a dissimilarity index of 1-extreme segregation-when sections are drawn horizontally; no household has moved.Similarly, 6.Another commonly used measure of exposure is the interaction index, which is the inverse of the isolation index presented above.
7. We are not the first to draw attention to this flaw in measures of segregation, see Cowgill and Cowgill [1951], Appendix A in Taeuber and Taeuber [1965], and Massey and Denton [1988].While this property is problematic for measures of residential segregation, it is less likely to affect measures of occupational or school segregation-where there is a natural clustering of individuals.

FIGURE II
A Hypothetical City vertical partitions yield an isolation index of .5 whereas horizontal partitions produce an index of 1.This is a highly undesirable property of any segregation index, as it may artificially indicate that a city is more or less segregated as a function of how the tracts are drawn.The key flaw is that there is no theory of how the city should be partitioned.Intuition suggests that the more disaggregated the better, but complete disaggregation results in all sections having only one race: maximum segregation, regardless of the city.
Second, existing measures are not defined when trying to measure segregation at the level of individuals.It is difficult to correctly identify the relationship between segregation and outcomes without individual-level variation in segregation.As a descriptive matter, individual segregation may be more useful than city-wide segregation.Rather than correlate individual economic outcomes with city-wide segregation, one can correlate individual outcomes with individual measures of segregation.On the other hand, the right level of aggregation depends on the problem at hand; group-level, neighborhood, or city-level segregation may be the appropriate level of aggregation in many applications.It is an open empirical question, one that cannot be answered without a measure that disaggregates to the individual level. 8he literature in economics involving the measurement of segregation is small [Phillipson 1993;Hutchens 2001;Frankel and Volij 2004].Similar to our exercise, their approach is axiomatic-identifying desirable properties that an index should possess.The literature takes an arbitrary partition of a city as given and uses the partition to identify indices axiomatically.There is little in common with our approach.

III. A MOTIVATING EXAMPLE
Before moving to a full description of the model, we present a stark example that previews the Spectral Index and discusses (informally) some of its properties.
Consider City 1, depicted in Figure III.The nodes in City 1 represent households.Each household can be one of two races: black or white.In the figure , household ( A, 1) is white, (B, 1) is black, and so on.
Our measure of segregation is based on the social network of the members of a race.Consider the black households in City 1.For the purposes of this example, we use the information on where an individual lives to infer whom she interacts with and trace out a network of social interactions based on residential patterns.Suppose that each individual interacts only with her immediate neighbors; ( A, 1) interacts with (B, 1) and ( A, 2); (D, 4) interacts with (C, 4), (E, 4), (D, 3), (D, 5), and so on.The resulting network of black households is shown on the right in Figure III.The thickness of a line connecting two individuals reflects the intensity of their relationships; thicker lines imply a node is at least one-third of an individual's social interactions.Here, (B, 2) has four neighbors, so she has a less intense relation to each one of them than (B, 1), who has only three neighbors.
Black households are partitioned in two separate networks.We call each of these subnetworks a connected component (CC).The fact that social networks are often partitioned in such CCs is of practical importance; components often correspond to ghettos or other natural clusterings of individuals.Let the CC on the left, comprising eight households, be denoted Component 1 and the component on the right, with three households, Component 2.
We envision segregation as the degree of connectivity of the race's social network.The potential effects of segregation arise because blacks tend to interact with blacks, and whites with whites.The idea that segregation is synonymous with same-race interactions has-once a network of social interactions is constructed-a formal expression in network connectivity.
The SSI is one measure of network connectivity.It arises as the unique measure that satisfies certain properties, the most important of which is a requirement that an individual be more segregated the more segregated are his direct neighbors.Concretely, an individual's segregation is the weighted sum of her neighbors' segregation, weighted by how much she interacts with each one of them.We discuss the properties in detail in the next section.
The SSI for blacks in City 1 is in Table I.Note that Component 1 is more segregated than Component 2, which reflects that the network in Component 1 is more connected than that in Component 2. The SSI also lets us disaggregate the componentwide SSI into individual household SSI: the component-wide SSI is the average of the individual SSI.Note that (C, 1) is the most segregated household in this example, which captures that this is an individual who only interacts with blacks.On the other hand, (D, 4) is the most integrated household in Component 1.
Individual SSI should be interpreted as the distribution of component-wide SSI within a network.So, a particular individual's SSI is relative to the SSI of the component she is in.Note how (D, 4)'s share in Component 1's segregation is small, while the distribution of segregation in Component 2 is quite even.So, (C, 4)'s SSI is smaller than (C, 5)'s.The component's SSI is the average of the individual SSIs; hence, an individual's SSI may be much larger than the SSI of her CC.
Finally, we remark that the SSI is invariant to the size of the population of blacks.If we double the size of City 1 by adjoining a copy of the city to itself, SSI will not change.We would have two new components and their respective SSIs, and the city SSI would be the weighted average of the four components.The basic building blocks for our measure of segregation is a set of individuals V and information on whether (and, possibly, how much) any two individuals interact.Hence, the measure depends on the network of social interactions among the individuals in V. Our measure identifies segregation of the members of a group with the intensity of the social interactions among the members of that group.
Given any two individuals, suppose we know whether they interact with each other and the intensity of their interaction.For any two individuals v and vЈ in V, let the number r vvЈ Ն 0 represent the nature of their relationship.If r vvЈ ϭ 0, then there is no relation between v and vЈ; if r vvЈ Ͼ 0 then v and vЈ have a relationship.Abusing notation, we use V to refer to the number of elements in the set V. The information on interactions is then summarized in a V ϫ V matrix R, with typical element r vvЈ .
We make two important assumptions about the numbers r vvЈ in R. First, we assume that individuals face a budget constraint for their social interactions: vЈʦV r vvЈ ϭ 1 for all v in V. Think of r vvЈ as the fraction of time that v spends with vЈ.Second, we assume that if r vvЈ ϭ 0, then r vЈv ϭ 0, though we allow r vvЈ and r vЈv to be different when they are not zero.We allow for r vvЈ r vЈv because a relationship can have a different level of importance or intensity to v and to vЈ.In fact, this comes up in empirical applications of SSI: v may interact only with vЈ, in which case r vvЈ ϭ 1, while vЈ may split his time equally among n other relationships, so r vЈv ϭ 1/n.Now, suppose that we know the race of each individual v ʦ V.For the rest of the section, fix one race, called Race h, and drop from the set V all individuals from races other than h.Form the matrix B from the matrix R by retaining only those r vvЈ for which both v and vЈ belong to Race h.The matrix B (a submatrix of R) reflects the network of same-race social interactions among the members of Race h.
Let us briefly discuss two examples, which preview our empirical applications in Section VI.First, suppose we construct B using information on residential patterns (and only information on residential patterns).We would need to set a criterion for who is a neighbor of whom and set r v,vЈ ϭ 0 when v and vЈ are not neighbors.The criterion could be that v and vЈ are neighbors if they live sufficiently close to each other.We can then suppose, in the absence of additional information on social interactions, that the relation with each of his neighbors is equally important to v, and set r vvЈ to be the inverse of the number of v's neighbors.Finally, we keep only those agents that belong to the race under analysis (Race h).Second, suppose we construct B from a survey on social interactions where individuals are asked to name their ten closest friends.We would then set r vvЈ ϭ 0 if v and vЈ do not name each other as friends and set r vvЈ to be the inverse of the number of v's friends, supposing the survey does not let us infer the relative importance of each friendship.The two examples are developed in detail empirically in Section VI.
It is important to note that, while we focus on the network of same-race interactions, the intensity of those interactions is affected by cross-race connections through r vvЈ .For example, let v be a member of Race h.If v interacts only with vЈ, and vЈ is in Race h, then r vvЈ ϭ 1, and 1 will be the only nonzero element of v's row of r vvЈ s in B. On the other hand, if v interacts with nine members of another race, besides vЈ, then r vvЈ ϭ 1 ⁄10 and 1 ⁄10 will be the only non-zero element of v's row of r vvЈ s in B. This difference implies that v is more integrated when he has relations with individuals of other races.We discuss this feature of our measure in Section V.C.
A segregation index for Race h is a function that assigns a real number S h (B) to each matrix B of same-race interactions, along with functions assigning a real number s v h (B) for each individual member v of Race h, such that S h (B) is the average of the individual s v h (B).Our definition of a segregation index reflects our desire that segregation be measured at the individual level.Individual segregation is measured in the same units as racial segregation; Race-h segregation is the average of the segregation of all individuals of Race h.

IV.B. Three Properties which Define the SSI
We present three properties that jointly define our measure of segregation.
The first property requires that an increase in the intensity of same-race interactions imply an increase in segregation.Concretely, say that a matrix BЈ has more intense interactions than matrix B if all the entries of the matrix BЈ are at least as large as those of B. Then, if B ϭ (r vvЈ ) and B ϭ (rЈ vvЈ ), we have r vvЈ Յ rЈ vvЈ for all v and vЈ.A segregation index satisfies the property of monotonicity if, whenever BЈ has more intense interactions than B, S h (B) Յ S h (BЈ).
The second property is a normalization of the index.Let d Ͼ 0 be a real number.A matrix B is homogeneous of degree d if, for all v in Race h, ¥ vЈ r vvЈ ϭ d.An example of a homogeneous of degree 3 ⁄4 matrix is Homogeneous networks rarely occur in practice, but the property gives an interpretation to the segregation of networks one encounters in applications.For example, a measure of 0.8 can be read as the segregation Race h individuals would have if they spent 80 percent of their time with individuals of the same race.Homogeneity also provides a "scale free" property: if City A has more households than City B, but each household in both cities has the same fraction of same-race neighbors, the index will report the same level of segregation for both cities.
Our third property is the most substantial and potentially controversial.We want the segregation of an individual i to depend on the segregation of the individuals with whom she interacts.We require that this dependence takes a linear form.We need some auxiliary concepts to present the third property.
Let N v be the set of individuals of Race h that v interacts with: the set of vЈ in Race h with r vvЈ Ͼ 0. In a similar vein, consider the set of individuals who interacts with the members of N v , those that interact with those that interact with the members of N v , and so on.The resulting set of individuals with direct or indirect interactions with v is called the CC of B that v belongs to; denote this set of individuals by C v .
The third property requires that s v h (B) be the average of s vЈ h (B) among v's Race-h social interactions, relative to the average segregation of the individuals in v's CC.If S C v is the average segregation of individuals in C v , say that a segregation index satisfies linearity if There are two qualitative assumptions behind the linearity property.The first is that v's segregation depends on his neighbors' segregation.As described in Section I, if one considers Figure I, which depicts the distribution of blacks across metropolitan Detroit, it seems evident that individuals in the center of the city's black ghetto should be measured as more segregated than those closer to the edge.Linearity is one embodiment of this requirement.In Section V.D we discuss the implications of relaxing this assumption.Note that, while the weights r v,vЈ must add to one, an individual's SSI is not bounded by 1.
The second qualitative property is that the dependence is modulated by the CC's segregation.That is, a decrease in the segregation of one of v's neighbors will affect v less if v lives in a highly segregated component.The key idea is that v receives the effects of segregation from her different neighbors, and any one neighbor is less important when the component is highly segregated.
It is not possible to relax linearity, while retaining the linear influence of neighbors' segregation.Suppose that v's segregation depends directly on her neighbor's segregation, but that it does not take the form assumed in the linearity property.Suppose that the component's segregation does not play a role, and that v's segregation depends directly on the sum of neighbor's segregation.Then, an increase in a neighbors' segregation gives a onefor-one increase in v's segregation, and this, in turn, directly impacts v's neighbor.The result does not necessarily (in fact, generally will not) converge to new levels of segregation.Our use of the components' segregation guarantees that the effect of an increase in segregation for a neighbor does not impact fully on v, at least not for large values of segregation, ensuring that there is a solution to the problem of determining all individuals' segregation measures.9 The three properties described above jointly define our index.The SSI is the (unique) segregation index that satisfies the properties of monotonicity, homogeneity, and linearity (Theorem 1, Appendix A).
On a CC, SSI is the largest eigenvalue of the corresponding irreducible submatrix of B. The individual SSI are obtained by distributing the component's SSI among individuals using the eigenvector corresponding to the largest eigenvalue.Thus, SSI results from familiar matrix operations and is easy to compute using standard software, such as MATLAB.The irreducible submatrices of B are often very sparse, meaning that many of its entries are zeroes.There are efficient algorithms for computing the largest eigenvalues of sparse matrices, and MATLAB comes with one such algorithm incorporated in its eigs command.

V. ANALYSIS OF THE SSI
The previous section described three properties that provide the precise assumptions underlying the SSI.In this section, we provide further properties and features of SSI, illuminate an alternative interpretation for the index, discuss other ways to incorporate cross-race interactions, and describe the implications of relaxing the linearity property.

V.A. An Alternative Interpretation of SSI
An alternative way to interpret the SSI is through a model of group-specific capital transmission.SSI is a measure of how fast same-group influences are disseminated purely as a result of social contacts. 10uppose that the matrix of same group social interactions, B, has only one CC (without this assumption, the result will hold in each CC of B).Let x v be a measure of how much group-specific capital an individual v has.We think of this capital as the depth of one's group identity; something that arises from repeated social interaction with people of one's own group.There is an inherent difference between visiting a church once to listen to their gospel choir and interacting constantly with people who are involved with gospel music.The intensity with which one experiences the same social phenomenon is the key to this difference.Segregation is related to this intensity, and one can show how SSI captures the intensity of same-group social phenomena.
Suppose that, in each period t, individual i's h-capital grows depending on how much h-specific capital her contacts have, and on how much v interacts with them.Specifically, suppose that (2) and that x v0 is given, for all v.
The law of motion in (2) is our assumption that capital reflects the intensity of v's own-race identity.Similar models have been used to capture cultural transmission in networks; see Brueckner and Smirnov [2004].11PROPOSITION 1.For all vectors ( x vЈ0 ) vЈ of initial stocks of capital, and all v, Proposition 1 shows that we can interpret SSI as the rate of growth of group-specific influences.It follows from a familiar calculation in Perron-Froebenius theory; recall that SSI is the largest eigenvalue of B in the case where we have only one CC.In economics the result is reminiscent of the balanced growth result in the theory of Leontief systems (see e.g., Samuelson and Solow [1953]).
Examples of this type of group-specific capital transmission may include language [Lazear 1999] and the choice of first names [Fryer and Levitt 2004].In a simple model of culture and language, Lazear [1999] shows that incentives to assimilate by learning to speak the native language are decreasing in the size of an ethnic enclave.Fryer and Levitt [2004] argue that the choice of distinctive first names is a cultural investment and show that this practice is more common in highly segregated areas.Both of these papers are consistent with the basic model of group-specific capital transmission described above and, ipso facto, our measure of segregation.

V.B. General Properties
We discuss here some important and more subtle properties of SSI.
First, SSI identifies isolated individuals by marking them as perfectly integrated.If v has no connections (r vvЈ ϭ 0) to individuals of his group, then s v h (B) ϭ 0. If v has relations with at least one individual of his same group, s v h (B) Ͼ 0 (Proposition 3, Appendix A).Perfectly-integrated groups are rare, but we do observe perfectly integrated individuals in our applications.
These are individuals who only interact with others of different races.SSI singles them out by assigning them a measure of zero.
Second, small changes in the structure of social interactions will entail small changes in SSI.SSI is a continuous function of the elements of B (Proposition 5, Appendix A).
Third, SSI is related to a calculation of connections between individuals.If v has a relation to vЈ and vЈ has one to vЉ, then information can travel from v to vЉ by the path v Ϫ vЈ Ϫ vЉ.It is intuitive to think of the number of such paths as a measure of how connected v is to vЉ.Segregation, on the other hand, is the extent to which individuals of the same group are connected, so counting paths between individuals gives rise to a natural measure of segregation.It turns out that SSI has a close connection to the number of paths that exist between individuals.Counting paths gives another interpretation of SSI.
We flesh out this connection in Appendix A.Here we give some simple calculations suggesting the nature of the relationship between counting paths between individuals within the same group and SSI.
Consider the following special case: each nonzero r vvЈ takes the same value, so r vvЈ is either 0 or r ʦ (0, 1).Let N v k be the set of individuals for which there is a path to v with, at most, k individuals.Then, where ␣ vvЈ is proportional to the number of paths between v and vЈ.Note how all the vЈ in the same component as v affect v's segregation.The weight of each vЈ is affected by the number of paths between v and vЈ.Concretely, ␣ vvЈ is obtained as the number of paths of length k (with k individuals) from v to vЈ multiplied by r k /(S h (B)) k .The number of paths from v to vЈ, in turn, is the vvЈ entry of the matrix (1/r k ) B k .Fourth, and related to the previous property, SSI captures certain multiplier effects in the social interactions network.An individual's susceptibility to own-group influences (patterns of speech, names, and other group-specific behavior) depends on how many contacts the individual has with his or her own group and the susceptibility of those contacts.
Consider the following thought experiment, depicted in Figure IV.We show the effect of changing the race of one individual in a network; the resulting changes in SSI capture the essence of the multiplier effects.Network A has three Black individuals who are connected to each other, and all of which are also connected to one White individual.To illustrate the multiplier effects captured in SSI, Network B changes the race of Individual 4 so she is also Black now.To keep the calculations transparent, we assume that Individual 4 also has three neighbors in total.Table II shows the levels of segregation before and after Individual 4 changes race.

V.C. More on Cross-Race Interactions
We argued that SSI captures cross-race interactions by their effect on the intensity of same-race interactions.We expand on this point here using a simple example and then discuss alternative ways of incorporating cross-race interactions.
We have argued that, if v interacts only with vЈ and vЈ is in race h, then v would be more segregated than if she interacts with nine other individuals who are not in Race h.We make the same point here with a concrete example.Consider  intensity of same-race interactions due to a decrease in r vvЈ s.Note that the SSI for the city on the right follows immediately because all black agents spend exactly half their time with other blacks.
An alternative way to incorporate cross-race interactions would be to explicitly let the segregation of individual v depend on the segregation of the neighbors that are not the same race as her.There are two potential problems with this.First, we would need to decide whether a more segregated white neighbor makes a black agent more or less segregated.There are simple arguments for both effects: a black agent may be expected to interact less with a highly segregated white and thus be more isolated from whites, or she may get more white specific capital from a segregated white and become less isolated from whites.Our approach is agnostic with respect to the effect of one race's segregation on another, and allows for the possibility of deciding the matter empirically.
The second objection is practical.The computational complexity of calculating SSI depends critically on the dimensions of the matrices B. If we need to allow explicitly for the interactions that each v has with all her neighbors, we would tend to get much more connected networks and, thus, much larger matrices B. As a result, the already slow task of calculating SSI would become extremely time consuming and likely infeasible in many applications.

V.D. Relaxing Linearity
Without assuming linearity, we would be unable to derive a unique numerical index.If, for example, the linearity assumption is replaced with a monotonicity condition-higher segregation among i's same-race neighbors imply higher s i h (␤)-one cannot pin down a specific numerical index.The situation is analogous to that of income distribution measures, where general properties lead to orderings of Lorenz curves that do not allow one to compare any two distributions.In our framework a Lorenz-curve-type ordering is readily obtained: Group h is more segregated in ␤ than in ␤Ј if the distribution of (¥ j rЈ ij ) dominates that of (¥ j r ij ).Something similar arises in the measurement of income distribution.Atkinson [1970] presents a partial order on income distributions, in which two distributions may not be comparable in terms of income inequality.When Lorenz curves cross, one has to decide how much weight to assign to each side of the intersection.Rather than choose ad hoc weights, which could differ for each application (which, some have argued, is the main reason researchers do not use the Atkinson index as a measure of segregation [Massey and Denton 1988]), we get implicit weights through the linearity property.

VI. TWO APPLICATIONS OF SSI: MEASURING SCHOOL AND RESIDENTIAL SEGREGATION
Here we develop two illustrative applications of SSI: estimating racial segregation of friendship networks in schools and residential segregation.12

VI.A. School Segregation
There is an impressive literature on the effects of segregation across schools on achievement.Guryan [2004] estimates that half of the decline in black dropout rates between 1970 and 1980 is attributable to desegregation plans.Crain and Strauss [1985] find that students randomly offered the chance to be bussed to a suburban school were more likely to work in professional jobs nearly 20 years after the experiment.Jencks et al. [1972] estimate that desegregation raises black achievement by 2-3 percent.Based on a meta-analysis of 93 studies, Crain and Mahard [1981] conclude that desegregation has a significant effect on black achievement, especially among younger children, though other meta-analyses are less conclusive [St. John 1975].
Yet, in the spirit of Martin Luther King, who dreamed that one day "little black boys and black girls will be able to join hands with little white boys and white girls and walk together as sisters and brothers," some argue that society should strive for integra-tion within schools not just across them [Lucas 1999;Mickelson 2001].Within-school segregation, commonly referred to as "second-generation segregation," is thought to be as important as segregation across schools in inhibiting the educational opportunities of racial and ethnic minorities [Mickelson 2001].Previous studies use traditional measures of segregation (such as exposure and dissimilarity) to measure segregation across schools.These measures do not disaggregate to the individual level and cannot use information on students' actual social contacts-limiting our ability to understand the relationship between within-school segregation and outcomes.
Data.The National Longitudinal Study of Adolescent Health (Addhealth) database is a nationally representative sample of 90,118 students entering grades 7 through 12 in the 1994 -1995 school year.A stratified random sample of 20,745 students was given an additional in-home interview; 17,700 parents of these children were also interviewed.Thus far, information has been collected on these students at three separate points in time : 1995, 1996, and 2002.There are 175 schools from 80 communities included in the sample with an average of more than 490 students per school, allowing within school analysis.Students who are missing data on race, grade level, or friendships are dropped from the sample.
A wide range of data are gathered on the students, as described in detail on the Addhealth website (http://www.cpc.unc.edu/projects/addhealth).Our primary outcome variables are divided between measures of academic achievement and those that are more associated with social behaviors.The social variables include smoking, skipping school (without a valid excuse), interracial dating, and whether or not a student is happy at his or her school.Smoking and skipping school are answers to the question, "During the past twelve months, how often did you . .."Answer choices range from never to nearly everyday.Interracial dating is a dichotomous variable equal to 1 if the student reports ever dating interracially and zero otherwise.Happiness measures whether or not students report being happy at their school.The academic variables include: Peabody Vocabulary Test (PVT) scores, whether or not a student plans to attend college, grades in the previous grading period, and a measure of how much effort the student exerts.All responses (including grades) are selfreported.For each student, grades were calculated by aggregating grades in four subjects: math, history, science, and English.
To measure school segregation, we make use of the information on friendship networks within schools available in the Addhealth database.All students contained in the in-school survey were asked, "List your closest male/female friends.List your best male/female friend first, then your next best friend, and so on."Students were allowed to list as many as five friends from each sex.Each friend can be linked in the data, and the full range of covariates in the in-school survey (race, gender, grade point average, etc.) can be gleaned from each friend.Friendship links are defined as unions: Student A is considered to be "friends" with Student B if A lists B as a friend, B lists A as a friend, or both.
Analysis.The school-level SSI is calculated by taking for each racial group the average SSI of each CC in the school that consists of students from that group, weighted by the size of those CCs.In other words, to calculate the black group SSI for School 1, assuming there are two black CCs in School 1, we find [(SSI of CC1)(size of CC1) ϩ (SSI of CC2)(size of CC2)]/[size of CC1 ϩ size of CC2].Students who are singletons (who do not have any friends from their racial group) are considered to be CCs of size 1 with SSI equal to 0 -completely integrated.
To make individual SSI comparable across CCs each individual SSI is multiplied by the size of the CC of which it is a part.
Figure VI depicts the relationship between the percentage of a racial group in a school and the level of segregation for that racial group in that school, using the Addhealth database.Each observation is a school.Grade levels 7-12 are combined.School level segregation ranges from .014 to .848across the 175 schools in AddHealth.The mean level of segregation is .618; the standard deviation is .146.
Many researchers assume the relationship between the segregation of a racial group within a school and the percentage of that group in the school is linear (see, for example, Orfield [1983]).This approximation is a good first pass for Whites (though we find nearly all White data points above the 45°line), but less true for Hispanics and Asians.For Blacks, the relationship between percent own-race in a school and own-race segregation is even more nonlinear.As the percentage of Black students increases from 0 to 25 percent, Black segregation rises sharply.Above 25 percent, Blacks are near complete segregation.
It is important to emphasize that our data do not allow one to disentangle why these patterns exist.The segregation observed in Figure VI could be a result of own-race preferences for social interactions or the response to external discrimination or racism.
Understanding the causal model underlying these observations is of great importance to our understanding of social interactions, bussing programs, and the optimal organization of schools, among other things.
Table III presents estimates of the relationship between individual-level measures of segregation and individual outcomes.Individual level segregation ranges from 0 to 174.973 with a mean of 0.618 and standard deviation of 2.48.
We estimate models of the form: (3) where i indexes individuals, j indexes schools, X i represents a set of individual level controls, and ␣ j denotes school fixed-effects.
The coefficient ␥ measures the relationship between the segregation of individual i and a given outcome for i.We concentrate on i , which measures the differential effect of individual segregation for group i relative to Whites, and ␥ ϩ i , which captures the overall relationship between segregation and outcomes for Group i.
For Blacks, individuals who are more segregated are less likely to smoke (a behavior predominant among White teens) and have lower test scores.Segregated Asians are less likely to skip school, more likely to have high test scores, put in more effort, and report being happier.Segregated Hispanics are less likely to smoke, more likely to have low test scores, low grades, and low probability of attending college.Not surprisingly, students of all races are less likely to date interracially when schools are more segregated.Similar results are obtained when one excludes school fixed-effects.

VI.B. Residential Segregation
The ideal data to estimate residential segregation would contain information on the nature of each household's interactions with other households.In lieu of this, we proceed like we did for the imaginary city of the example in Section III: We use geographical distance to infer social interactions.In addition, since we lack individual-level data, we work with block-level data from the 2000 U. S. Census.We restrict our sample to the 313 Metropolitan Statistical Areas (MSAs).The data are available from Geolytics Inc. (see http://www.geolytics.com/).
Census blocks contain, on average, 300 households and are approximately 100 meters in radius.We identify a block with the race/ethnicity of the majority of its inhabitants.This assumption is not too problematic, as blocks are strikingly homogeneous: 94.3 percent of Iowans live in a homogeneous census block and so do 77 percent of Texans.Save Washington DC, more than 60 percent of the blocks in all states contain households of only one race (for half the states, 80 percent or more of the blocks contain only one race).
We assume that two blocks are neighbors if they are within one kilometer of each other. 13From this, we know when r ij should be nonzero.The next step is to calculate the intensities of social interactions, the values of r ij .We obtain the total number, d i , of neighbors of block i, i.e., the number of blocks that are within one kilometer of i, independent of race.Absent further information on the structure of social interactions in neighborhoods and consistent with the budget constraint described in Section IV, let r ij ϭ 1/d i .With the resulting matrix B, we are in a position to calculate SSI using the characterization we present in the Appendix. 14 An important caveat to our application of SSI to residential segregation is that it ignores block density. 15To correct for this, one could assign all individuals in a census block to the centroid of that block, and run the resulting individual-level estimation.This method, however, is computationally very costly.
Baseline Residential Segregation.Since SSI for Race h is a measure of the connectivity of the Race-h network, it will tend to be larger in cities with larger fractions of Race-h individuals, even if individuals are located at random in the city.
We refer to the SSI one would expect to see in a city when individuals locate at random as Baseline SSI.We provide estimates of both SSI, and of the SSI in excess of Baseline SSI.
We have obtained measures of Baseline SSI by simulating 13.We have used one kilometer radii because one kilometer is the median radius of a census tract (1.03), and tracts are the traditional notion of a neighborhood in the literature.Our results alter little when we change criterion to 0.5 or 1.5 kilometers.
14.We need to calculate the largest eigenvalue of (each CC of) B. The Matlab programs to calculate all indices reported in the paper are available at http:// post.economics.harvard.edu/faculty/fryer/fryer.html.
15.This likely induces little error in the estimates of segregation, given our definition of neighbor usually encompasses several blocks.In areas such as New York, however, this limitation may be quite restrictive.is a binary value taking the value of one if the student agrees or strongly agrees that they are happy to be at their school.No college is a binary variable that equals one if the student reports a probability of .5 or greater that she will attend college.Effort is an ordered categorical variable that takes values .25 if student never tries at all, .50 if they don't try very hard, .75 if the student reports they try hard enough, but not as hard as they could, and 1 if the student reports they try very hard to do their best.Test scores are adjusted to be standard normal.Grade composites are constructed from four reported grades: English/languages arts, mathematics, history/social studies, and science.Grades are first converted to their equivalent on a 4-point scale: In all cases, dummy variables for missing values and school fixed effects are included.Robust standard errors are beneath the coefficients.*, significant at 5 percent; **, significant at 1 percent.
random assignment of races to large regular (in a graph-theoretic sense) cities with the corresponding fraction of Race-h inhabitants.Concretely, for each fraction p ϭ 0.01, 0.02, . . ., 0.99 we simulated 1,000 cities of 100 households each, where each household is of Race h with probability p. 16  Figure VII shows the results of our simulations.On the horizontal axis is the fraction of Race-h inhabitants, while the vertical axis shows the average SSI.When the share of Race-h inhabitants in a city is relatively small, SSI mirrors the percent Race-h in a city closely.This is to be expected.When Race-h 16.For a few values of p we ran simulations of much larger cities, with 2,500 nodes, and we obtain the same results.For the simulation of the full range of p we chose size 100 because the larger simulations are very time intensive.All simulations were done in Matlab; the code is available from the authors.

FIGURE VII Simulating the Baseline Spectral Segregation Index
We have obtained measures of Baseline SSI by simulating random assignment of races to large regular (in a graph-theoretic sense) cities with the corresponding fraction of Race h inhabitants.For each fraction p ϭ 0.01, 0.02, . . ., 0.99 we simulated 1,000 cities of 100 households each, where each household is of Race h with probability p.
inhabitants are relatively few and assigned to a city at random, linearity has little power to alter SSI from percent black.As the fraction of Race-h individuals increases, however, SSI significantly departs from the percentage of Race h in a city.We have used only large cities, as we can prove (see Appendix B) that baseline SSI converges as a city grows.In fact, the simulations show the convergence to be quite fast.
The Extent of Segregation across Cities.Detroit, MI, is the most segregated city for Blacks; Lowell, MA, for whites; McAllen, TX, for Hispanics, and Honolulu, HI, for Asians. 17The list seems quite intuitive.It also confirms that SSI is correlated with the size of a minority group.The latter point begs for a distinction between SSI and "adjusted" SSI: the segregation in excess of baseline SSI.It is unclear which is most closely related to economic outcomes.Adjusted SSI tells us more about preferences, while the original SSI is a better measure of the pure connectedness in a network.The most segregated cities using adjusted SSI for Asians, Blacks, Hispanics, and Whites are Los Angeles, CA; Milwaukee, WI; Flagstaff, AZ; and Pine Bluff, AR, respectively.Approximately 11 percent of households in Milwaukee are Black, implying an expected SSI of 0.1145 if blocks were allocated at random.The actual measure of segregation is a factor of 9 larger.To generate the level of segregation in Milwaukee, assuming blocks were assigned a race at random, blacks need to comprise 80 percent of the population.
We have emphasized how the SSI allows one to consider more disaggregated units than the city.One of the most interesting units is the agglomeration of same-race blocks: Racially homogenous ghettos, which SSI identifies endogenously as CCs (see Section IV).This is related to city-wide SSI, but SSI weighs the ghetto's SSI against members of the same race in other parts of the city who are more integrated.For Blacks and Whites, the largest ghetto is Detroit-implying an enormous amount of citywide segregation.Remarkably, 87 percent of black blocks in Detroit comprise one large ghetto.The largest CC is San Francisco for Asians, and Los Angeles for Hispanics.Hispanics in Los Angeles comprise the largest minority ghetto in America; 17,909 Hispanic blocks are connected.
Along with the variation across cities in SSI, there are several MSA level characteristics that are associated with higher levels of racial segregation.For instance, cities that exhibit higher segregation for Blacks tend to be larger cities, have a high percentage of female-headed households, and are less likely to be in the West.
Table IV presents a correlation matrix of popular measures of segregation.These measures include dissimilarity, isolation, Gini coefficient, exposure, entropy, and interaction.Also included in the matrix are SSI, SSI minus the baseline, and the ranking of cities based solely on their fraction of Blacks.All measures were calculated using data at the census block level for 326 MSAs.The Spectral index has surprisingly little correlation with dissimilarity, gini, entropy, and interaction-averaging less than 0.5-and high correlation with isolation and exposure, averaging more than 0.90.Given the nature of the isolation and exposure indexes, it is not surprising that SSI is more correlated with the measures relative to the others.As a measure of residential segregation, our measure is very similar to existing measures of exposure with the added ability to disaggregate to the level of individuals and a well-understood theoretical foundation.Adjusted SSI becomes even less correlated with dissimilarity and isolation.The fraction of blacks in a city is highly correlated with SSI, but the linearity property assures that this correlation is less than perfect.
The Relationship between Residential Segregation and Outcomes.The economic literature on the effects of segregation on outcomes is impressive.Case and Katz [1991] show that youths in a central city are affected by the characteristics of their neighbors.Almond, Chay, and Greenstone [2003] show that segregation of hospitals in the Jim Crow era had a significant negative effect on infant mortality.Using evidence from the Moving to Opportunity experiment, Katz, Kling, and Liebman [2001] and Kling, Liebman, and Katz [2005] provide evidence that moving individuals to lower poverty neighborhoods has substantial effects on mental and physical health of parents and children.Cutler and Glaeser [1997] is one of the most influential papers in economics on the impact of segregation.They use the dissimilarity index as a measure of segregation.We re-estimate the impact of black segregation on economic outcomes with Cutler and Glaeser's specification.Econometrically, we estimate models of the form  where outcome i is measured at the individual level and segregation j is measured at the MSA level and compare the results obtained with SSI and the dissimilarity index.Identical to Cutler and Glaeser [1997], we correlate measures of segregation with various economic and social outcomes for young people aged 20 -30.We choose to focus on younger individuals for three reasons.First, they are most susceptible to group level influences as a result of social interactions.Second, the problems of mobility across metropolitan areas is more easily avoided.Third, and most importantly, it mirrors the specifications in Cutler and Glaeser [1997].For identical reasons, we drop individuals born in a foreign country.Data from the 1990 1% Census Public Micro Use Sample are used.Our sample contains 97,976 individuals aged 20 -24 and 139,715 individuals between the ages of 25 and 30 residing in the 204 MSAs with at least 100,000 people and 10,000 blacks in 1990.This sample is identical to Cutler and Glaeser [1997].
Outcome measures are divided into three categories: educational attainment, labor market, and social outcomes.Educational attainment is measured as the probability an individual graduates from high school or college.There are two measures of labor market outcomes.The first is whether or not an individual is idle (not working and not employed).The second is earnings (sum of wages, salary, and self-employment income).In all specifications, we use the natural logarithm of earnings, conditional on the individual not being in school and reporting positive earnings. 18The final outcome variable is a social outcome-whether a woman is an unmarried mother.
Table V presents a series of ordinary least squares estimates of the relationship between segregation and outcomes for persons aged 20 -24 and 25-30, using the dissimilarity index and the SSI-controlling for the standard set of individual and MSA-level covariates used by Cutler and Glaeser [1997].Each measure of segregation has been normalized such that they have a mean of zero and a standard deviation of one.
The top panel of Table V replicates Cutler and Glaeser's [1997] results using the dissimilarity index.The bottom panel estimates the same specification using SSI.Results differ slightly between SSI and dissimilarity.On each outcome, cities with higher dissimilarity indices have inferior outcomes: less likely to graduate from high school or college, more likely to be unemployed and not in school, earn less money, and more likely to be a single mother.SSI paints a similar portrait, though the magnitudes are slightly weaker.No qualitative conclusions are unchanged.In all cases, the R-squared from regressions using the dissimilarity index and those using the Spectral index are remarkably similar.

VII. CONCLUSION
For decades, social scientists have used measures of evenness and exposure to estimate the prevalence and impact of segregation in housing, firms, and schools.These measures have many limitations, which we have discussed throughout.This paper develops a new measure of segregation based on two key ideas: a measure of segregation should disaggregate to the level of individuals, and an individual is more segregated the more segregated are the agents with whom they interact.Developing three properties that any segregation measure should satisfy, our main result shows that one and only one segregation index satisfies our three properties and the two aims mentioned above-the SSI.To illustrate the potential of the index, it is applied to two wellknown social problems: measuring within-school and residential segregation and several new facts and insights are gleaned.We hope the Spectral index will be a useful tool for applied researchers interested in the agglomeration of individuals in networks.

APPENDIX A: TECHNICAL PROOFS
We present formally the results stated in Sections IV and V. Fix a race h.Let C k ,k ϭ 1,2, . . .K be the CCs of B. Abusing notation, let C k also denote the submatrix of B with columns (and rows) indexed by the elements of C k .Let k be the largest eigenvalue of C k and x k be its associated eigenvector, normalized so its entries add to one. 19he SSI is the index B ‫ۋ‬ ͑S ˆh͑B͒, ͑s ˆi͑B͒͒ iʦh ͒, where S ˆh(B) ϭ ¥ iʦh (s ˆi(B)/V) and s ˆi(B) ϭ k x ki ͉C k ͉.THEOREM 2. A segregation index satisfies Monotonicity, Homogeneity, and Linearity if and only if it is the SSI.
We note that the properties of Monotonicity, Homogeneity, and Linearity are independent in the sense that no pair of properties imply the third.
We state two additional properties of SSI.Proposition 3 was stated informally in Section IV.Proposition 4 is informative about SSI and used in the proofs below.PROPOSITION 3. If v has at least one same-race neighbor, s ˆv h (B) Ͼ 0. If v has no same-race neighbors, s ˆv h (B) ϭ 0.
Proof.If i ʦ h has at least one same-race neighbor, then i is in C k for some irreducible submatrix C k .Let k be the largest eigenvalue of C k and x k be its associated eigenvector.By Lemma 6, x k is strictly positive, so x ki Ͼ 0. Since k Ͼ 0 (Lemma 6), the definition of s ˆi h (B) implies that s ˆi h (B) Ͼ 0. Q.E.D.
and S C k is the largest eigenvalue of C k .So S ˆh(B) is the weighted average of the components' largest eigenvalues.
Proof.We show that Proof.This is a direct consequence of Theorem 2 and the result in Appendix D of Horn and Johnson [1985]. Q.E.D.

A. Proof of Theorem 2
The proof of Theorem 2 proceeds by stating and proving 5 lemmas that together establish the theorem.

A MEASURE OF SEGREGATION BASED ON SOCIAL INTERACTIONS
The first lemma unifies some standard results about irreducible matrices.LEMMA 6.Let C be a real, nonnegative, irreducible matrix.Then A has a real, positive, eigenvalue with associated eigenvector y, such that 1. y is strictly positive, so y i Ͼ 0 for all i, and y is the unique, up to a scalar multiple, strictly positive eigenvector of C; 2. is larger than ͉͉, for any other eigenvalue of C; in particular, is larger than any other real eigenvalue.
Proof.By the Perron-Froebenius Theorem (Theorem 8.4.4 in Horn and Johnson [1985]), C has a real, strictly positive, eigenvalue, , with associated strictly positive eigenvector y.The multiplicity of is one and is larger than ͉͉, for any other eigenvalue of C ( is the spectral radius of C).
Let z be any strictly positive eigenvector, by Corollary 8.1.30in Horn and Johnson, z is associated to eigenvalue .The z is a scalar multiple of y, as has multiplicity one.
Q.E.D. Now we verify that the SSI satisfies our three axioms.
Proof.Let BЈ have more intense interactions than B. Let CЈ ϭ (cЈ ij ) be an irreducible submatrix of BЈ.Then the set of rows in CЈ is the union of the rows in some collection C 1 ,C 2 , . . ., C L of irreducible submatrices of B. Let C ϭ (c ij ) be the blockdiagonal matrix with C 1 ,C 2 , . . ., C L in its diagonal.Let xЈ be an eigenvector associated to the largest eigenvalue Ј of CЈ.Then CЈxЈ ϭ ЈxЈ, x i Ͼ 0 for all i (Lemma 6), and BЈ having more intense interactions than B imply that Statements ( 5) and ( 6) imply that Յ Ј.But Ј is S CЈ (Proposition 4); so Յ S ˆCЈ .Now we prove that S ˆCl Յ , for l ϭ 1 . . .L. Let l be the largest real eigenvalue of C l .Let x l be an eigenvector of C l , associated to l ; let y ϭ ( y i ) iʦC be the vector obtained from x l by letting y i ϭ x li if i ʦ C l and 0 otherwise.Then, since C is block-diagonal, l is an eigenvalue of C, with associated eigenvector y.By definition of , since l is real, l Յ .But Proposition 4 implies that l ϭ S ˆCl , so S ˆCl Յ , for l ϭ 1 . . .L.
Let CЈ k ,k ϭ 1, . . ., K be the irreducible submatrices of B h Ј, and let each CЈ k be the union of L k irreducible submatrices of B h ,CЈ kl with l ϭ 1, . . ., L k .By Proposition 4 LEMMA 8.The SSI satisfies Homogeneity.
Proof.Let a ʦ A be h-homogeneous of degree d.Let y ϭ 1, then homogeneity says that Ay ϭ d1, so d is an eigenvalue with eigenvector y.By Lemma 6 d must coincide with , the largest eigenvalue of B, and the rescaled eigenvector must coincide with x.So, S ˆh(B) ϭ d. Q.E.D.
Proof.By Proposition 4, S ˆCk is an eigenvalue with eigenvector ( x i ), the eigenvector in the definition of the spectral index.For any i, s Second, we prove that any index that satisfies the three axioms must coincide with the spectral index.Let (S h (B), (s i (B)) iʦh ) be a segregation index that satisfies the three axioms.
LEMMA 10.If B has b ij ϭ 0 for all i and j, then s i (B) ϭ s ˆi(B) for all i.
Proof.By Homogeneity, S h (B) ϭ 0, so we must have s i (B) ϭ 0 for all i, as s i (B) Ն 0 and S h (B) is the average s i (B).Thus, the index coincides with the SSI. Q.E.D.
LEMMA 11.For any B, s i (B) ϭ s ˆi(B) for all i.
Proof.If B is such that b ij ϭ 0 for all i and j, we are done by Lemma 10.Suppose that b ij Ͼ 0 for at least one i and j.
Let ␥ ϭ min {b ij : b ij Ͼ 0}.Let D ϭ (d ij ) be the matrix defined by d ij ϭ 0 if b ij ϭ 0 and Note that ¥ j d ij ϭ ␥ for all i, so D is homogeneous of degree ␥.Then Homogeneity implies that S h (D) ϭ ␥.Now, by definition of D, D has more intense interactions than B. So Monotonicity implies that S h (B) Ն S h (D) ϭ ␥.Hence, S h (B) Ͼ 0.
Fix a component C k such that S C k Ͼ 0; since S h (B) Ͼ 0, there must exist at least one such component.For Now, s i (B) Ͼ 0 for all i.Since s i (B) ϭ 0 for some i would imply, by Linearity, that all j ʦ N i have s j (B) ϭ 0, then, by recursion, s j (B) ϭ 0 for all j ʦ C k , which would contradict that S C k Ͼ 0. Hence, x is a strictly positive eigenvector.
By Proposition 4 and Lemma 6, now S C k ϭ S ˆCk , and by the rescaling ¥ iʦC k x i ϭ 1, x must coincide with the defining eigenvector in the definition of the SSI.Then, s i (B) ϭ s ˆi(B) for all i.
Finally, take a component with S C k ϭ 0. Then Monotonicity and Lemma 10 imply that b ij ϭ 0 for all i and j in C k .Q.E.D. Lemmas ( 7) through (11) establish the theorem.

B. Results in Section V
We first prove Proposition 1 and then state and prove additional results that were informally announced in Section V.The results are formalizations of the discussion of network connectivity in Section V.
Proof of Proposition 1.Let I denote the V ϫ V identity matrix.Let D ϭ I ϩ B. Then (2) implies that the vector x t ϭ ( x it ) i satisfies x t ϭ Dx tϪ1 , for all t.So x t ϭ D t x 0 .By Lemma 8.4.2 in Horn and Johnson [1985], 1 ϩ S ˆh(B) is the largest eigenvalue of D. By Lemma 8.2.7 in Horn and Johnson, there is a matrix L such that 478 We provide two results that help interpret the SSI.The first relates SSI to how many neighbors individuals have.The second result shows how SSI measures the connectivity of the h-race network.Both results hold in the neighborhood model, where r ij is either 0 or r Ͼ 0.
Here we interpret B as graph, denoted G, for which the vertexes are the individuals, and there is an edge (link) between two indexes i and j if r ij Ͼ 0. The degree of a vertex i, d(i), is the number of edges at i. Proof.See Cvetkovic and Rowlinson [1990]. Q.E.D.
Let d i be the number of same-race neighbors of household i. Proposition 12 proves that, Homogeneity notwithstanding, S ˆh(B) is larger than the average d i over the individuals with a(i) ϭ h.Now we use walks in a graph to bring out the relation between SSI and network connectivity.A walk of length k is a sequence of (not necessarily different) vertexes v 1 ,v 2 , . . ., v k ,v kϩ1 such that for each i ϭ 1,2, . . ., k there is an edge from v i to v iϩ1 .A walk is closed if v kϩ1 ϭ v 1 .Let W i be the number of walks of length that individual i ʦ V can take in B and define W ϭ ¥ i W i . Let W ij be the number of walks of length between individual i ʦ V and j ʦ V.A graph is bipartite if its vertex-set admits a partition into two classes such that every edge has its ends in different classes.The graphs one encounters in applications of SSI are never bipartite.PROPOSITION 13.For sufficiently large: (1) [W i /(S ˆh(B)) Ϫ1 ] is approximately proportional to s ˆi h (B), and the constant of proportionality is independent of i; (2) ͱ W /n h approximates S ˆh(B); and (3) if B is non-bipartite, W ij is approximately proportional to (S ˆh(B)) Ϫ2 s ˆi h (B)s ˆj h (B).
Proof.Let U ϭ (u i ) be the eigenvectors of B, normalized to form an orthonormal basis, so U T U ϭ I. Let D be the matrix with the eigenvalues of B on the diagonal, and 0 everywhere else.So, A ϭ UDU T .
If 1 is the vector with 1 in all its entries, the vector of -long walks (W i ) is defined by (W i ) ϭ A 1. So, (W i ) ϭ UD U T 1.The (u i ) vectors form a basis, so there are scalars ( i ) such that 1 ϭ ¥ i i u i .
Then (W i ) ϭ ¥ i i UD U T u i .But U T u i ϭ e i , the vector with i-th entry 1, and 0 elsewhere.So (W i ) ϭ ¥ i i i Ue i ϭ ¥ i ii u i .Let 1 ϭ S h ; 1 has multiplicity 1, as B has a unique nontrivial eigenvector (Theorem 2.1.3 in Cvetkovic, Rowlinson, and Simic [1997]).So S h (␤) Ͼ i , i ϭ 2,3 . . .͉h͉.
3 0 for all i 1.Since u 1 is a scalar multiple of the ( x i ) vector in the definition of the spectral index, S h (B) 1 u 1 is a scalar multiple of s i h .The second statement is a theorem of Cvetkovic, stated in the survey by Cvetkovic and Rowlinson [1990].The third statement is essentially Theorem 2.2.5 in Cvetkovic, Rowlinson, and Simic.Q.E.D.
Proposition 13 (1) says that, as grows, W i (S ˆh(B)) Ϫ1 converges.Thus S ˆh measures the growth in the number of walks that i can take.Further, it converges to something proportional to s ˆi, thus, individual SSI measures explain the differences, among individuals, in how many walks they can take relative to S ˆ.Statement (2) in Proposition 13 says that W ϳ V(S ˆh(␤)) .The total number of walks will grow at rate S ˆh(B) (a statement that is similar, and has a similar proof to that of Proposition 1).Finally, (3) says that two individuals' measures are related to how many walks there are between the two individuals, relative to the total number of walks (given by S ˆh(B), in light of Statement (2)).

C. Baseline Segregation
Here we present a theoretical justification for our "baseline" simulations.SSI converges as a city's size grows, so we can estimate SSI for relatively large cities (the size of 6,400 is enough in our simulations).
Let H ϭ {0, 1} be the set of races.We are interested in only one race here, so working with H ϭ {0, 1} is without loss of generality.Let V n be set of households, such that if n Յ m, then V n ʕ V m .
Let ⍀ n ϭ H V n be the set of possible assignments of households to races.Abusing notation, let ʦ ⍀ n represent the resulting V n ϫ V n matrix of social interactions.Endow the power set of ⍀ n with the probability measure p k obtained by letting each household be Race 1 with probability ʦ (0, 1), independent of the races of other households.Let Since the E n S ˆh are bounded above by (1), the result follows.Let q n,m be the probability distribution on H V m ‫گ‬V n induced by letting each household be race 1 with probability ʦ (0, 1), independently of the races of other households.Abusing notation, we shall use q n,m for the probability distribution induced by q n,m on { ʦ ⍀ m : ͉ V n ϭ {0} V n }.Then, comma-separated columns.The first column is an identifier and should be in double quotes.The second is latitude.The third is longitude.The fourth is the group identifier for that block.For example, msa_369.txtmight be: " 360150102006073",42.24114,-76.81282,1 "360150108003016",42.13062,-76.82308,1 "360150102003009",42.20382,-76.88979,2This would correspond to city 369 having eight census blocks, of which five are majority group 1, two are majority group 2, and one is majority group 4. neighbors.muses this information to make the neighbor matrix needed to calculate the SSI.
The If you wish to adapt these files for use in a nongeographic application, the main point of modification would be at line 38 of neighbors.m,which is the linking rule.If you wished to study the segregation of, for instance, a social network, this line of code (which currently calculates geographic distance and compares it with the "neighbor radius" solicited earlier) would be replaced by code that checks whether two people have a link in the social network.Other code would have to change too of course (for instance, latitude and longitude might be replaced by a list of friends' IDs), but the essential thing that determines the type of application is the linking rule.
FIGURE I Segregation in Metropolitan Detroit Figure I is based on block-level data from the 2000 U. S. Census.
FIGURE III A Simple Example FIGURE IV Individual 4 Changes Race FIGURE V A Change in the Number of White Neighbors FIGURE VIThe Relationship between Group Size and Group Segregation, by Race Figure VI is based on data from the National Study of Adolescent Health.Each data point represents segregation calculated at the school level based on students' responses about who their friends are.
data from the National Longitudinal Survey of Adolescent Health.Dependent variables vary by column.Smoking and Skip school are binary variables taking the value 1 if the student does the activity once a week or more.Interracial dating is a binary variable equal to one if a student reports ever dating someone of a different race.Happiness using block-level data from from all 313 MSAs in the 2000 U. S. Census.The sample includes all census blocks in all MSAs.Baseline SSI calculated from simulations described in Section V.1.B.

PROPOSITION 4 .
If C k , k ϭ 1 . . .K are the CCs (the irreducible submatrices) of B, then is the weighted average of the S C k follows immediately by definition of S h (B) and S C k .Q.E.D. PROPOSITION 5. S ˆh(B) is a continuous function of the entries of B.
j .Let ϭ max {͉͉ : is an eigenvalue of C} be the spectral radius of C.Then, by Horn and Johnson's Theorem 8.1 Let d min ϭ min {d(v)͉v ʦ V} denote the minimum degree of G, d max ϭ max {d(v)͉v ʦ V} represent its maximum degree, and d ϭ (1/͉V͉) ¥ vʦV d(v) represent the average degree of G. 20 Q.E.D. PROPOSITION 12. Let d min , d , and d max be the minimum, average, and maximum degrees of B h , respectively.Then d min Յ d Յ S ˆh Յ d max .
be the expected value of the SSI.PROPOSITION 14.There is S such that E n 1 S as n 3 ϱ.Proof.We shall prove that, if n Յ m, then ʦ⍀n S ˆh͑͒ p n ͑͒ Յ ʦ⍀m S ˆh͑͒ p m ͑͒.

TABLE III THE
All regressions are estimated using the 1990 1% Census Pums.Dependent variables vary by column.Idleness is defined as not working and not enrolled in school.Earnings are the sum of wage, salary, and self-employment income in 1989.The sample for earnings consists of individuals who are not working, not enrolled in school, and have nonnegative earnings.All regressions include the following covariates: an exhaustive set of racial dummy variables, gender, single year age dummy variables, log of population, percent Black, log median household income, and manufacturing share.The latter four covariates are also interacted with a Black dummy.Standard errors, reported in parentheses, are corrected for heteroskedasticity and intra-MSA clustering of the residuals.
program generates two main types of output.Summary data appears in matrix called sipartial.mat.Information about individual blocks appears in output files called si_#.txt,where again # is the city identifier.The sipartial.matmatrixhas twelve columns: Column 1: city identifier Column 2: group identifier Column 3: SSI for group for city Column 4: number of CCs for group Column 5: number of singletons for group Column 6: median CC size for group Column 7: largest CC size for group Column 8: smallest CC size for group Column 9: total number of blocks of group Column 10: percent of blocks belonging to group Column 11: average number of neighbors for group Column 12: average number of same-group neighbors for group As you can see, columns 1 and 2 identify the unique city/ group combination; column 3 gives the SSI; and columns 4 -12 give supporting statistics.If you wish to find the SSI for each individual block you must look at the si_#.txtoutput files.These files have five columns SSI for CC For example, to find the individual SSI for block 360150102006073 in city 369 you would look in the file si_369.txtfor the row that has 360150102006073 in the third column.The individual SSI is the value in the fourth column.
DIVISION OF THE HUMANITIES AND SOCIAL SCIENCES, CALIFORNIA INSTITUTE OF TECHNOLOGY HARVARD UNIVERSITY SOCIETY OF FELLOWS AND NATIONAL BUREAU OF ECONOMIC RESEARCH