Why Do The Poor Live In Cities? The Role of Public Transportation

More than 19 percent of people in American central cities are poor. In suburbs, just 7.5 percent of people live in poverty. The income elasticity of demand for land is too low for urban poverty to come from wealthy individuals’ wanting to live where land is cheap (the traditional explanation of urban poverty). A significant income elasticity for land exists only because the rich eschew apartment living, and that elasticity is still too low to explain the poor’s urbanization. The urbanization of poverty comes mainly from better access to public transportation in central cities.


I. Introduction
In 2000, 19.9 percent of the population in the central cities of MSAs lived in poverty, compared with just 7.5 percent of the population in suburbs. While there is substantial rural poverty, it is well established that within U.S. metropolitan areas, the poor live closer to the city center than the rich (Margo [24], Mieszkowski and Mills [26], Mills and Lubuele [29]. 1 Moreover, this gap does not occur because the poor are stuck in cities or because inner city ghettos create poverty. The gap between city and suburban poverty rates is just as large for people who have recently moved between MSAs as it is for longtime residents. Central cities disproportionately attract the poor, at least in the U.S. Our aim is to understand the sorting of the poor, within metropolitan areas, into the dense inner cities. 2 This puzzle-why do the poor live disproportionately in cities?-is one of the central questions in urban economics. A primary triumph of urban land use theory (Alonso [1], Becker [5], Muth [30], Mills and Hamilton [28]) is its ability to explain the urban centralization of the poor. This monocentric urban model argues that richer consumers buy more land and therefore choose to live where land is cheap. The model can explain why the poor live in city centers as long as the income elasticity of demand for land is greater than the income elasticity of travel costs per mile. In its classic exposition, the model assumes that everyone uses the same mode of transportation and that the main cost of transport is time. In this case, the poor will live in cities if and only if the income elasticity of demand for land is greater than one (see Becker [5]).
While this result is theoretically elegant, there are two reasons why its applicability to modern American cities is limited. First, as many authors have emphasized, our cities are not monocentric: in 2001, 75.9 percent of metropolitan area employment was more than three miles from the Central Business District (Anas, Arnott and Small [2], Glaeser and Kahn [17]). Second, the income elasticity of demand for land is far less than one. Among residents of single-family detached homes, we estimate the income elasticity of land to be .1. When we include both apartment dwellers and residents of detached homes, the income elasticity can be as high as .5, but seems more likely to us to be around . 25. This income elasticity of demand for land seems to occur almost exclusively because middle-income individuals like single-family detached homes, not apartments. As the income elasticity of demand for land is still far less than one, it can only explain part of the puzzle.
We follow LeRoy and Sonstelie [23] and argue that the primary reason for central city poverty is public transportation. The large financial costs of automobiles make them unattractive to the poor; public transportation offers a time-intensive alternative that will be more appealing to those with low incomes. Public transportation relies on high densities, so if inner cities have public transportation and suburbs do not, then this can explain the urbanization of the poor. 3 This view does not require a monocentric model. If suburbs are a complete urban environment built around the car, and inner cities are rival area built around public transportation, then it is easy to understand why the poor live and work in inner cities.
Our evidence supports the importance of public transportation in explaining the location decisions of the poor. Within cities, proximity to public transportation does well at predicting the location of the poor. This holds for rail transit stops in 16 cities that have expanded their rail transit systems over the last 30 years, and for bus stops in Los Angeles. Across cities, the poor are likely to live in cities with more public transportation and the poor are less centralized when the suburb-central city gap in public transit is less. Lower levels of central city public transportation in the West may explain why the centralization of the poor is less in that region.
Of course, transit access is endogenous and public transportation may be structured to service the poor. To address this endogeneity, we first examine the effect of proximity to subways in the outer boroughs of New York City. No subway stops have been added since 1942, so at least some claim might be made that subway-stop locations were predetermined prior to the evolution of many neighborhood characteristics. We further look at rail expansions in 16 major cities. These 16 extensions were explicitly designed to connect central city areas to richer suburbs and not to improve access in poor areas. Here, the census tracts that gained access to public transportation became poorer.
We then return to the model and calibrate it to check whether it can explain the centralization of the poor. This calibration uses data from the 2001 National Household Transportation Survey to estimate the time costs of taking public transportation and driving. Our best estimates are that transport modes are two to three times more important than the income elasticity of demand for land in explaining the central location of the poor. Indeed, including transport modes into the model makes it clear that with multiple transport modes, we should always expect the poor to centralize, at least at U.S. levels of income inequality.
While the monocentric urban model can explain the centralization of the poor when it allows for multiple transportation modes, this model does increasing injustice to reality. In Section VII of the paper, we show that the tendency of the poor to suburbanize is unsurprisingly higher in metropolitan areas where jobs are decentralized. In Section VIII, we argue that the historical evidence supports the importance of public transportation as one determinant of the centralization of poverty. Section IX concludes.

II. Preliminary Facts
Throughout this paper, many of our results will be based on geocoded census tract data from the Urban Institute and Census Geolytics' Neighborhood Change Database (see Baum-Snow and Kahn [4]). Based on year 2000 census tract data, the average poverty rate for people living within 25 miles from a Central Business District is 11.7%. 4 The average poverty rate for people living zero to ten miles from the CBD is 14.5%, while for people living 10 to 25 miles from the CBD, the average poverty rate is 8.3%.
We first review the basic facts on urban poverty in the United States based on 2000 Census micro data from the 1% IPUMS Sample. Table 1 reports mean poverty rates by demographic group and by geographic category. In the first column, we give the poverty rate for members of the population subgroup who are living in the central cities of metropolitan areas (based on the census designation of central cities and using the formal census definition of whether a person is in poverty). In the second column, we provide the poverty rate for comparable persons living in the metropolitan area outside of the central city (which we will refer to as suburbs). In the third column, we show the comparable poverty rate for those who live outside of metropolitan areas altogether.
The first five rows describe urbanization of poverty in the U.S. and in the four major census regions. In the U.S. as a whole, the poverty rate is 19.9 percent in central cities and 7.5 percent in metropolitan areas outside of the central city. The poverty rate outside of metropolitan areas is also high, but that is not the focus of this paper.
The second and third rows show that the biggest city-suburb poverty gaps are in the Northeast and the Midwest. In the Northeast, the poverty rate is 14.2 percent higher in the central cities than it is in the suburbs. In the Midwest, the poverty rate is more than 14.2 percent higher than it is in the suburbs. The fourth and fifth rows show the poverty gaps for the West and the South. In both of these areas, the city-suburb poverty gap remains, but the gaps are lower. In particular, the city-suburb poverty gap in the West is only 8.6 percent. Any theory about the location of the poor should also be able to explain these regional differences.
In the next rows of the table, we examine the possibility that the connection between city residence and poverty is treatment (i.e., cities make people poor) not selection (i.e., the poor disproportionately move to central cities). While ghettos may exacerbate poverty, these four columns show that the selection of the poor into the city is intense. The citysuburb poverty rate gap for recent movers is generally larger than the city-suburb poverty rate gap for long-term residents. Among people who came to their MSA in the last five years, the poverty rate is 21.3 percent in the central city and 10 percent in the suburbs. Among people who switched homes within the same MSA in the last five years, the poverty rate is 21.8 percent in the city and 10.4 percent in the suburbs. The natural explanation of these facts is that cities are attracting the poor, not just making them.
Given the high proportion of the urban poor who are black, it is natural to hypothesize that central city poverty is really just another example of the segregation of minorities. Rows 11 and 12 look at the poverty rates of blacks and non-blacks by city residence. The central city-suburb poverty gap is 8.5 percent among non-blacks. This is almost as large as the 10.5 percent overall gap. In row 13, we look at the city-suburb poverty gap for MSAs that are less than 10 percent black. The poverty gap is 9.8 percent in those cities. Race is clearly important, but it is not a dominant factor in explaining the urban centralization of the poor. Moreover, racism only explains separation between blacks and whites-it does not explanation the urbanization of the poor. 5 The poverty rates enumerated in Table 1 conceal the considerable heterogeneity that exists within metropolitan areas. Using census tract-level data from the 2000 decennial Census, Figures 1 and 2 illustrate the connection between income and distance from the city center. The figures plot average household income against distance (measured in miles) from the Central Business District. Central Business District location is from the 1982 Census, which was based on polls of local leaders. Figure 1 shows the income-distance relationship for three older metropolitan areas (New York, Chicago and Philadelphia). In these cities (and in most other older cities) there is a clear u-shaped pattern. The census tracts closest to the city center are often among the richest in the metropolitan area. The poorest census tracts come next, with the bottom of the curves generally lying between three and six miles away from the Central Business District. After that point income rises again. In most cities, income begins to fall again in the outer suburbs. Figure 2 shows the income-distance relationship for three newer cities (Los Angeles, Atlanta, and Phoenix). In these cities a different pattern emerges. Rather than a u-shaped pattern, median income shows a generally monotonic increasing relationship with distance from the Central Business District. As in the older cities, income sometimes falls in the outer suburbs. Table 2 reports regression results to highlight the poverty patterns and household median income patterns across all cities, "older" cities, and "newer" cities. In older cities (defined as cities that were large in 1900), income generally falls with distance from the CBD for the first three miles, and then rises. In newer cities (cities that were not large in 1900), income rises monotonically with distance from the CBD. Ideally, a theory of the centralization of poverty should be able to explain these differences.
Older and newer cities differ with respect to their share of total employment located near the CBD. Based on 2000 data from the census zip code employment file, 55 percent of metropolitan area employment within 25 miles from the CBD is more than five miles from the Central Business District for our "old" cities and 81% of metro area employment within 25 miles from the "new" CBD is more than five miles from the CBD. (for more on the data source, see Baum-Snow and Kahn [4]). Thus, there is a sense in which the monocentric model is relevant for a diminishing number of American cities (Mieszkowski and Smith [27], Giuliano and Small [14], Glaeser and Kahn [16], [17]). In the majority of cities, people both live and work outside of the central city. This suburbanization of employment means that most suburban residents do not drive into the city to go to work, they stay out in the suburbs. Does living in the suburbs increase commuting time? 6 To examine this point, we use data for 109 metropolitan areas that have 200,000 or more jobs in a 25 mile radius from their CBD. For these metropolitan areas, we examine how commute times vary as a function of distance from the CBD. As shown in column (1), in a metropolitan area with 0% of its employment within five miles of the CBD, commute times would decline by -.59 minutes per extra mile of distance from the CBD. In contrast, in a metropolitan area with complete job centralization, an extra mile of commute distance would increase the average one way commute time by roughly 2 minutes (2.6246-.5857). Note how similar the slope coefficients are in columns (1)  In this data set the unit of analysis is a person. Zip code level identifiers allow us to identify people who live within ten miles of a CBD. 7 In column (4), we use information in the 2001 NHTS concerning the mileage distance of a person's commute. The regression coefficients indicate that for a commuter who works in a metropolitan area with complete job centralization that an extra mile of distance from the CBD increases the distance of the commute by .75 miles (.2959+.4636).
(TRANSITION sentence needed on

III. Models of Urban Poverty
Many theories that seem to explain the urbanization of the poor actually explain only the separation of the non-poor and the poor. Some authors argue that crime, schools and other urban social problems explain the flight of the rich from cities (see Mieszkowski and Mills [26], Mills and Lubuele [29]). These arguments are surely right. People who leave the cities often cite these urban social problems as a primary reason for their exodus (see Katz, Kling and Liebman [22]). Suburban governments that cater to wealthier voters surely help attract the rich. The rich are willing to pay to avoid proximity to the poor, perhaps because of crime, weak public schools, or discriminatory tastes. 8 However, urban social problems and the presence of minorities do not explain urban poverty. Urban social problems derive more from the concentration of poor people in cities rather than anything intrinsic to cities themselves. 9 As such, urban social problems create a multiplier effect where an initial attraction of the poor to cities will then be greatly magnified to create significant poor/non-poor segregation. 10 Perhaps the poor just ended up in the city center by chance and the rest followed. But this view seems hard to reconcile with the fact that the poor are over-represented in the central cities of every one of America's metropolitan areas. A satisfying theory of urban centralization should explain not only why the poor and the non-poor live apart, but also why, conditional upon the poor and non-poor living apart, the poor choose to live closer to the city center.
The classic Alonso-Muth-Mills (AMM) model offers just such an explanation of why the poor live in cities: the rich move to suburbs where the land is cheap so that they can own bigger houses. In the simplest AMM model, the key condition for the suburbanization of the non-poor to occur is that the elasticity of demand for land with respect to income is greater than the elasticity of the value of time with respect to income (see Becker [5]). While we believe non-monocentric models offer a better chance of being able to explain more of America's urban landscape, even the standard monocentric model can do a much better job of explaining the patterns of wealth and poverty if it incorporates different transport modes (as in LeRoy and Sonstelie [23]).
To frame the empirical work, it makes sense to consider the implications of a particularly simple version of the AMM model with two income groups and two commuting modes. We assume two classes of people: rich people with income Rich Y and poor people with income Poor Y . The rich have an opportunity cost of time equal to Rich W ; the poor have an opportunity cost of time equal to Poor W . For simplicity, we assume that the land consumption of the rich is fixed at Rich A and that the land consumption of the poor is fixed at Poor A . As land consumption is fixed, locations will be chosen by individuals who are minimizing the sum of commuting costs and housing costs. For any fixed income group (with time cost W and land consumption A) using a transportation technology which requires T units per mile, being indifferent over locations implies that the price of land must satisfy Price .
(1) The price of land must fall with distance just enough to compensate commuters for longer commutes. The bid-rent gradient, determines which group lives closest to the city center. If two groups neighbor one another, the group with the steeper bid-rent gradient will live closer to the city center, since if two groups neighbor each other they will pay the same at the point of contact, but the group with the steeper bid-rent gradient will be willing to pay more for land at distances closer to the city center. If everyone has the same transportation costs, then the poor live closest to the city center if and only if the income elasticity of demand for land, and the elasticity of the time cost of commuting with respect to income. This condition (which appeared in Becker [5]) is described as saying that the poor will live in the city center if and only the elasticity of land consumption with respect to income is greater than the income elasticity of time cost with respect to income.
Following LeRoy and Sonstelie [23], we assume that there are two modes of transportation: public transportation and driving. The slow mode requires P T time units per mile and also has a fixed time cost of F . Driving a car has a fixed financial cost of C and requires C T time units per mile of commute, where C P T T > (although there are some areas where subways will be faster than cars).
We consider two cases that capture older and newer American cities. In the first case, corresponding to newer cities, only the poor take public transportation. In the second case, both groups take public transportation. If , then the poor will take public transportation for distances closer than ) , but the rich will always drive. In this case, the poor will have a steeper bid-rent gradient and live closer to the city center if and only if This condition is more likely to hold than the standard AMM condition, because it incorporates the fact that the poor have a comparative advantage in using public transportation, and public transportation has a comparative advantage in commutes for short distances from the city center. This assumes that public transportation is available everywhere; if public transportation was only accessible close to the city center, then this would further increase the tendency of the poor to centralize. We will estimate the magnitudes of A Y ε and to figure out which is more important in explaining the centralization of the poor.
If these conditions hold and , so that the demand for land cannot alone explain the centralization of poverty, then a city will have three rings. In the innermost ring, poor people will take public transportation, in the middle ring, rich people will drive; and finally, on the outskirts, poor people will drive.
then some rich people will take public transportation. In that case, a city will have four rings, and the innermost circle of the city will contain rich people taking public transportation. The remaining three rings will be the same as those in the city described above. We think of these assumptions as characterizing the older cities of the U.S. and some European cities like London and Paris. In these older cities, a larger presence of public transportation tends to lower F, and higher garage and insurance costs raise C . 12 In this case, again the key condition for the poor to live closer to the city center than the rich is that If the AMM model is to be useful in understanding the centralization of the poor, then we . There is an extensive literature on the value W Y ε , and we will not try to re-estimate this coefficient. Becker's [5] analysis, where the wage is the opportunity cost of time, suggests that this elasticity should equal one. This will continue to be true if the value of time is any fixed multiple of the wage rate (e.g., everyone values commuting at .5 times their wage). Some urban research on commuting costs has reported empirical evidence of smaller commuting cost 12 In a previous draft (Glaeser, Kahn and Rappaport [19]), we considered a third mode of transportation, walking: a technology that is very slow but has no fixed costs. The existence of this technology ensures that as long as , the innermost ring of the city will contain rich walkers. This third technology further supports the idea that the older walking cities have an inner ring filled with the rich. 13 If , then transport technology differences are less important relative to the differences in value of time. This is likely to be true in highly unequal societies, and in these places, we expect to see the rich living closest to the city center because their time is so valuable relative to the time of the poor.
elasticities (see Small [34] and Calfee and Winston [9]). Our view is that the empirical literature is inconclusive, but that it is probably reasonable to assume that W Y ε equals .75 (or more) based on the theoretically predicted unitary elasticity of commuting costs with respect to income as our benchmark value.

IV. The Income Elasticity of Demand for Land
Our objective now is to estimate the income elasticity of demand for land, and to compare this elasticity with the benchmark value of one. While there is a large empirical literature on the income elasticity of demand for housing as a whole, there has been little work on the income elasticity of demand for lot size.
We focus on reporting new estimates of land demand elasticities. 14 In some variants of the AMM model, the income elasticity of demand for lot size equals the income elasticity of spending on housing, but this is obviously not true in general. Total housing consumption includes both intensive housing attributes (better infrastructure, finished basements, taller structures) and neighborhood amenities that do not necessarily get more expensive as land consumption rises. 15 A tendency to consume fancy bathrooms and kitchen appliances (i.e sub-zero refrigerators) does not create any incentive to live closer to or further away from the city center, once land consumption is held constant.
(THIS contradicts Jan's claim) "At a given location in the city, land demand is just proportional to housing demand, with the constant of proportionality being the reciprocal of structural density (housing output per acre). So the income elasticities are the same, holding location fixed. It's certainly fine to focus on land demand, but the justification should be one of convenience." Our basic household level regression is Our primary data source for this exercise is the 2003 American Housing Survey (AHS), which provides us with information on the individual incomes, personal characteristics, and lot size for single-family homes. Conceptually, the biggest problem with this regression is that we omitted the price of land. In principle, this omission could bias the coefficient in either direction. However, the essence of the AMM model's explanation of the urban poverty is that the rich are going to live in areas where the price of land costs less, and this would mean that the estimated coefficient is biased upwards.  (1) and (2) show our results for residents who live in single-family detached housing. In regression (1) we include only income and metropolitan area fixed effects. The estimated income elasticity of demand for housing is .08, and the standard error of this estimate is .008. In column (2), we include controls for age, race and household size. The coefficient remains .08. The coefficient is quite robust to alternative specifications and including other controls. We also tested for the possibility that income elasticities are stronger for families with children subgroups by interacting income with the presence of children. We found only a small, positive interaction.
One criticism of estimating income elasticities using current income is that current income measures permanent income with error, therefore estimated coefficients are biased towards zero. In column (3), we follow a standard approach to this problem and instrument for income with years of education. This approach is only sensible if education is correlated with permanent income but has no other impact on housing consumption. The estimated coefficient rises to .26.
Estimating income elasticity using only single-family homes is problematic, however, because much of the population in inner cities lives in multi-family dwellings. Unfortunately, we do not have data on lot size for multi-family dwellings. To overcome this problem, we have constructed a lot size variable for apartment residents. For these buildings, we have taken interior area and multiplied it by 1.5 to find total area consumed by each household. This multiple is meant to accommodate hallways, lobbies and external space. As any multiplier of this sort is likely to be relatively inexact, we have duplicated our regressions for a range of multipliers from 1.25 to 2, and the coefficient on income remains almost unchanged. We then divide by the number of floors in the apartment building to calculate land area per household.
Using this constructed measure of land consumption, column (4) reports an estimated coefficient of .34. This coefficient includes MSA fixed effects, but no other controls. This estimate is much larger than the estimate in the regression reported above. In regression (5), we include controls for the householder's age, marital status, race and ethnicity, size of household, and number of children, and instrument for income using years of education. In this case, the coefficient rises to .55.
A related hypothesis on the income elasticity of the demand for land is richer individuals don't care about owning large quantities of land themselves, but they do want to live in less-dense communities. This demand may occur because less-dense communities are associated with public safety and fewer social problems (see Glaeser and Sacerdote [18]). Estimating the income elasticity of demand for land based on individual lot size would therefore underestimate the true income elasticity of demand for land. To examine such a possibility, we look at the relationship between median household income and average household land use at the tract level: Log(Land per Household)=.48*Log(Median Income)+ Other Controls The standard error on income elasticity estimate is .008. We use census data from 1990 and 2000. We include year fixed effects and metropolitan area fixed effects, and we control for distance from the Central Business District. This approach cannot account for the large amounts of tract land space that may be used for commercial and other nonresidential, "non-open space" purposes, but one advantage of tract-level analysis is that we can control for distance from the Central Business District in our regressions.
We believe that these estimates of the elasticity of land demand with respect to income, ranging from .25 to .5, are something of an upper bound on the true income elasticity of land demand with respect to housing prices, because we are not including the public transportation-related reasons for the poor to centralize. If the rich live in suburbs (to drive), and if single-family detached housing is disproportionate in suburbs, then we will observe a connection between income and land area that is spurious. The most aggressive estimates of income elasticity of demand for land are still too low to explain the centralization of poverty, and we think that more realistic estimates make it clear that there is still much to be explained.
Brueckner and Rosenthal [6] argue that lower-quality housing in the central city can help explain the centralization of the poor. We view this theory as complementary to the public transportation hypothesis. Both theories suggest that older infrastructure that was originally designed for a poorer time now appeals to poorer residents. However, one piece of evidence that suggests that this explanation cannot explain everything is the low income elasticity of demand for new housing in the American Housing Survey. In Regressions (6) and (7) in Table 3, we regress the log of the age of the home on household income. Based on OLS, we estimate an age elasticity of -.05, and based on IV we estimate an elasticity of -.23.
Another set of complementary hypotheses emphasize land use controls that restrict the amount of low-cost housing in suburbs. In an earlier working paper version of this paper (Glaeser, Kahn and Rappaport [19]), we address this issue by calculating whether housing costs for the poor rise disproportionately in the suburbs. We found that while prices may be cheaper in some central cities, prices are not disproportionately lower for the types of housing consumed by the poor. As such, there did not seem to be a homeprice related financial incentive for the poor to differentially locate in central cities.

V. Income Sorting and the Multi-Mode Transportation Model
At this point, we return to the LeRoy and Sonstelie [23] We include only those commuters that live within 10 miles of their workplace.
Our first regression in Table 6 shows results for walking. Commuters who walk to work take 10.2 minutes per mile. We suspect walkers of overestimating their athleticism. The second regression shows results for automobile users. Car travel takes about 1.6 minutes per mile, which suggests an average speed of 37.5 miles per hour. The fixed time cost of driving is 5.6 minutes, which presumably reflects walking to and from parking spots. Given its large sample size, we are particularly confident about this automobile regression.
the condition for the poor to live in the center, will hold? We have estimated that C T equals 1.6, and aggregating bus and subway results suggest that P T equals 3. With these estimates,  . Given the levels of income inequality that we see within cities, this is always likely to hold (at least within the United States). As we have seen in Figures 1  and 2, income levels generally rise by much less than 100 percent within American cities.
A second question is whether these parameter values predict that the poor will use public transportation and that the rich will drive, which requires that the cash savings from public transportation be greater than the time cost for the poor and less than the time cost for the rich. Our estimates suggest that for a five-mile commute, driving saves 23 minutes per trip relative to a bus (which is close to the average time difference between car and public transportation commute times in the U.S. as a whole). If a car costs $2,000 per year or $4 per commute (50 weeks and eight commutes per week), then this time savings makes sense for someone with an opportunity cost of time above $13 per hour and not for someone whose time is less valuable. Given the U.S. income distribution, these figures predict that a 100 percent increase in income from $10 to $20 per hour should be associated with a massive shift from public transportation to driving. . Lower values of W Y ε will make the condition hold more often.
The core result from this calibration is that given reasonable parameter estimates, people earning $10 an hour would be expected to take public transportation and people earning $20 an hour would be expected to drive. But given the fact that public transportation is almost twice as slow as driving, we should still expect the poorer people who take public transportation to live closer to the city center. Natural parameter estimates for the U.S. readily predict that the poor will both take public transportation when it is available and then locate close to the city center.
A final reasonable question is the relative importance of transport costs and demand elasticities in pushing the poor to the center.
or 27 percent of the total right hand side of the inequality. This suggests to us that public transportation explains almost three-quarters of the sorting of the poor into the center and demand for land explains one-quarter of our basic puzzle.

IV. Public Transportation and the Location of the Poor
We now present evidence on the connection between poverty and public transportation. We seek to understand whether access to public transit can explain the relationship between the urban income gradient and distance to the city center. (GOOD?) We first look within cities at a point in time and look at whether the poor live in places where there is access to public transportation. 16 Next, we look at whether poverty rates increase in places where access to public transportation has increased. In the next section, we calibrate the condition suggested by the LeRoy and Sonstelie [23] modified-AMM model.
In Table 4, we turn to tract level data and test whether, in a cross-section of census tracts, the poor live close to public transportation. We have two distinct samples: 16 cities where we have data on rail access; and the outer boroughs of New York City, where we have data on subway stops.
In all three samples, we first regress the log of household median income on distance from the CBD. This is meant to measure the extent of the sorting of the poor in each of the three samples. Then we control for public transportation usage in the tract. This measure is meant to capture the raw effect of public transport usage. Since this is itself a function of poverty, we do not put too much stock in these raw regressions. Finally, for each sample, we instrument for public transport usage with our measures of access to public transportation. Table 2) for a subsample made up of tracts from 16 cities. We estimate a piecewise linear (spline) regression allowing the coefficient on distance to change at three and ten miles. The coefficient on distance is .099 within three miles and .062 for tracts between three and ten miles of the city center. In regression (2), we control for the share of tract workers who commute using public transport.

Column (1) replicates the basic income-distance relationship shown earlier (in
Column (2) shows that including public transportation usage increases explanatory power and eliminates two-thirds of the positive relationship between distance and income for distances less than ten miles from the city center. 17 In column (3), we instrument for public transportation usage using distance to train lines (see Baum-Snow and Kahn, 2005). In this case the coefficient on public transportation increases dramatically, and the relationship between income and distance to the city center flips sign. Access to public transportation appears to explain all of the connection between distance and income. Table 4, columns (7) through (9) measure transportation usage solely by subway usage, using tract-level data from the New York City boroughs of Queens, Brooklyn and the Bronx. (Staten Island has no subways and subway coverage is far too dense in Manhattan to provide any meaningful variation.) For this sample, no subway stops have been added since 1942; thus any endogeneity on stop locations stems from poverty levels of at least 48 years earlier. As many neighborhoods have changed radically during this period, we believe that these locations can be thought of as having some degree of exogeneity. The results are quite compatible with the earlier samples. Public transportation usage appears to strongly predict poverty and to explain a substantial amount of the connection between proximity and poverty.
In Table 5, we look at the effects of public transportation expansions on tract-level poverty. For our 16-city sample, public transit construction between 1980 and 2000 increased the supply of communities with close access to rail transit. As discussed in Baum-Snow and Kahn [4], these transit expansions were intended to connect suburban locations to the Central Business District. In Table 5, we look at whether poverty rates rose in tracts where rail transportation became more accessible. Presumably, public transportation's appeal to the poor arises because it eliminates the need to own a car. As such, we look at areas where new construction made it possible to walk to a transit line. Using data for the 16 metro areas, we estimate In this regression, the key explanatory variable is a dummy that equals one if a census tract is within one mile of rail transit. In estimating this regression, we exploit a tractlevel panel dataset where we observe each census tract in 1980, 1990 and 2000. In regression (1), we include metropolitan area fixed effects, year fixed effects, and MSA by year fixed effects. Tracts that are within a mile of rail transit have 4 percentage points higher poverty rates. In regressions (2) and (4), we include tract fixed effects, and thus we are examining how tract poverty rates change as some census tracts are "treated" with increased access to rail transit due to city-level rail transit expansions. We find small but statistically significant results. Based on the results in regressions (2) and (4), a treated tract experiences a .004 percentage point increase in poverty relative to non-treated tracts in the same metropolitan area that are equidistant to the CBD. While the results in Table  5 are modest, they continue to suggest the positive impact of access to public transportation on the location of the poor.
Anecdotal information also suggests that changes in public transportation can lead to increased poverty. For example, Harlem's evolution into a ghetto begins with the extension of the subway into that area (see Osofsky [31]). As public transportation came to Harlem, African Americans moved from less-segregated, less-attractive areas closer to the city center into this newly accessible place.

Further Implications
We now turn to implications of the model to see if they help us understand heterogeneity across cities in the U.S. and the world. A very clear implication of the model is that if our analysis is correct, and There is really no area in America where cars are not an important part of transport, but in many areas of the U.S., only cars are used. If public transportation explains the centralization of poverty, then we should expect the rich to live closer to the city center in those metropolitan areas where almost nobody commutes using public transit. To identify these metropolitan areas, we examine public transit use in census tracts between 5 and 15 miles from the CBD. For each metropolitan area in that mileage range, we identify the census tract with the maximum public transit use. We drop all metropolitan areas where the tract with the highest public transit share of commuters exceeds 2.5%. This leaves us a sample of 99 metropolitan areas. In Table 7's column (1) we refer to these metropolitan areas as the "car zone".
Column (1) of Table 7 shows a significant negative relationship between distance from CBD and income in car zones. In an area where only one mode of transportation is being used, richer people appear to live closer to the city center. This suggests that the existence of multiple modes of transport is crucial for understanding why the poor live in cities.
As a second test of the theory, we look at the effects of subways across metropolitan areas. The theory predicts that the transition from poor to non-poor will occur when cars replace public transportation. If a different transportation technology changes the point at which cars substitute for public transportation, this will change the point where urban poverty is replaced by higher income areas. We examine the subset of metropolitan areas that have subways. The effect of these subways is to move the public transit zone much further out, since the time cost per mile of subways is much lower than the time cost per mile of buses. In column (2) of Table 7, we examine the relationship between tract median income and tract distance from the CBD in subway cities and non-subway cities. The subway cities include Boston, Chicago, New York City and Philadelphia. In subway cities, incomes first decline with respect to distance from the CBD, out to three miles. Beyond three miles from the CBD, income increases as the distance to the CBD increases. In contrast, for non-subway cities, incomes rise with distance from the CBD for tracts within three miles of the city center. In column (3) of Table 7, we examine the relationship between tract public transit use and tract distance from the CBD in subway cities and non-subway cities. Subway cities feature a positive relationship between public transit use and distance to the CBD for tracts within three miles of the CBD. Figure 3, panels A and B show the patterns of income and public transportation usage in subway cities and non-subway cities respectively. In both cases, income and public transportation usage track one another (note that we have inverted income values with respect to the vertical axis). In cities with subways, public transit use remains high even at distances relatively far from the city center. In the subway cities, near the city center median income falls with distance from the CBD as predicted by the three-mode model (assuming a zone in which both poor and non-poor individuals use public transit). The rise in income and fall in public transit usage beyond three miles from the CBD in subway cities presumably pick up the shift from public transit to car usage by highincome individuals.
Explaining Poverty Sorting in "New" and "Old" Cities (BUILD) An additional benefit of the transportation mode model is that it can explain the different income-location patterns between old and new cities described in the stylized fact section above. In new cities, even within three miles of the city center, the non-poor drive cars and the poor take public transportation. The correlation between the logarithm of income and public transportation use at the census tract level in the new cities within three miles of the CBD is -.509. As the non-poor are driving, it is quite understandable that they live further from the city center.
However, in the old cities there is a positive connection between income and public transportation use. Indeed, the correlation between the logarithm of income and public transportation use is positive .259 in old cities within three miles of the CBD. Furthermore, in that region there is a positive relationship between walking and income: .162. As the non-poor appear to be particularly drawn to the high time cost per mile technology in older cities, it should not surprise us to find them closer to the city center in the older cities. In the newer cities, the non-poor are particularly likely to drive cars, and it should therefore not surprise us to find them living further from the city center. Thus, the transportation model can explain the differences between the old and new cities.

The Centralization of Poverty in the Past
We have presented new evidence based on the recent experience of U.S cities to argue that the pursuit of minimizing total commuting costs helps to explain the centralization of the poor. The U.S historical experience and evidence from around the world today provides additional insights about this hypothesis.
As LeRoy and Sonstelie [23] and Gin and Sonstelie [13] argue, transportation technologies help us to understand the changing patterns of wealth and poverty within metropolitan areas over time. LeRoy and Sonstelie show that in New York, 52 percent of workers earning less than $10 per week walked to work in 1907. Only 12 percent of workers earning $20 per day used that form of transportation, and instead used streetcars. Just as the car today favored the non-poor, the streetcar did in the past, and it helps to explain why the poor lived close to the city center 100 years ago.
Was there a time in the U.S. when only one mode was used? Urban public transportation really only began in 1828, when omnibus lines were pioneered in Paris, and prior to this date, all but the very wealthy walked to work. If the model is correct, then during this time period the rich should have lived close to the city center, and the decentralization of the rich should only have happened with the onset of public transportation.
Before 1800, Boston was tied to its wharves. The famous Bonner map of 1722 shows a massive clustering of homes of both the rich and the poor around the wharves. For example, the merchant tailor Robert Keayne's house, which became the town house, faced the market square. The Governor's house, which had originally been built privately by Peter Sargeant, was close by. Throughout the 18 th and early 19 th centuries high-end development occurred in Summer Street, Bowdoin Square and Bulfinch's Tontine Crescent in Franklin Street. While there were certainly occasional mansions (such as the Bromfield House on Beacon Street) built on the edge of town, the mass of development for the wealthier residents appears to be quite close to the traditional downtown. Only in the 19 th century, when the omnibus became available did the wealthy move away from the poor. The old South End became an area for the poor and Back Bay (once it was filled in) became a place for the rich. Warner [35] traces the powerful role that the streetcars played in moving wealthy Bostonians west.
New York appears to follow a similar pattern. In the 18 th century, New York City ran from the Governor's Mansion, which was at the tip of Manhattan Island, where the Central Business District was and remains, to the poor house, which was almost at the palisades that marked the upper reaches of the city. Bowling Green (also at the tip of the island) and "the stylish streets west of Broadway and near the Battery," (Burrows and Wallace [8], p. 448) remained the center of fashionable New York through the early 19 th century.
But in the 1830s, uptown areas, such as Washington Square and later Fifth Avenue, developed as centers for the wealthy (previously Washington Square had served as an execution ground). Their growth perfectly paralleled the development of the omnibus. While the horse-drawn omnibus had only been introduced to New York in 1831, by 1833 there were 80 of them and by 1834, one New York paper referred to Manhattan as the "city of omnibuses" (Burrows and Wallace [8], p.565). The exodus of the non-poor from the downtown and the modern pattern of centralization of the poor really didn't begin in the U.S. until the early 19 th century, when expensive forms of horse-drawn transport eliminated the need for the rich to live within walking distance of their work.

The Centralization of Poverty: The Case of London and Paris
London, like New York, has a sizable population of rich residents. The huge time costs involved in commuting, by either car or train, have made proximity attractive in both of those metropolises. Both cities were built around public transportation, not automobiles, and as such driving is difficult and proximity remains attractive for the rich. 18 Still, within London, there is considerable segregation of rich and poor, and it appears that the city basically follows a U.S. pattern. The poorest areas of the city, in the East End, directly abut the city center. The richer areas of the West End, such as Mayfair and Piccadilly, are somewhat further removed from the business center of the city, and they neighbor London's sizable parks. This social geography has remained remarkably unchanged since the Victorian era.
The wealth of the West End predates public transportation. In the 18 th century, a number of extremely wealthy nobles and haute bourgeoisie built villas and townhomes in this area. This was not a large scale phenomenon. These residents generally had carriages (a very elite form of transportation) and are quite unlikely to have worked at all. The larger group of richer, working Londoners "were scattered throughout all parts of the metropolis, but there were concentrations in the City, North London and the commercial parts of Westminster" (Sheppard [33]). The mass exodus of the upper middle class into the West End only happened in the 19 th century, again accompanied by the onset of the bus and the train. Just as in the case of New York and Boston, London's wealthy left the central city when public transportation became available.
Finally, there is the example of Paris, which always had wealth at its center (in the Louvre and the Ile de la Cite) and has wealth at its center today. Two hundred years ago, the core of Paris was filled with both rich and poor, often living on top of one another. 19 As we have already seen, through the early 19 th centuries, London, New York and Boston followed the same pattern. But in the case of Paris, the rich stayed at the center and the poor left. Why didn't improvements in transportation technology induce the rich to leave Paris?
Brueckner, Thisse and Zenou [7] argue that the urban amenities of the City of Light are so attractive that the wealthy want to stay. Today, Parisian amenities are indeed striking, but at the dawn of the omnibus era, Paris was generally thought of as archetype of urban blight, a crowded, medieval city suffering from filth and disease. Victor Considerant described Paris in 1848 as "a great manufactory of putrefaction in which poverty, plague, and disease labor in concert, and air and sunlight barely enter. Paris is a foul hole …" (cited in Shapiro [33]). Indeed, at the beginning of the Second Empire, just as in London, the Parisian wealthy were moving away from the city center and heading progressively westward. The Louvre and the Ile de la Cite were crammed with slums and it would have been reasonable to expect Paris to end up exactly like Detroit.
But that was before Napoleon III and Baron Haussmann. In part out of an urban vision and in part out of a desire to eliminate the urban revolutions that had toppled two governments in 20 years, the Imperial government undertook a massive program of urban gentrification. They cleared slums, starting with the Louvre. In some cases, they just destroyed the housing and in others they replaced the slums with streets and squares. They built monuments, such as the Opera, and other public amenities to increase the demand of the rich for the city center. Finally, Haussmann rigorously regulated new construction and exercised "his strong partiality for expensive housing for the rich" (Jordan [21], p. 232).
The subsequent history of Paris echoed the work of Haussmann. Massive public housing was built outside of the city. Public transportation was extremely subsidized to enable the working classes to live in those suburbs and commute into the city. The central city was rigorously regulated and innovations that would have made high-density housing possible for the poor in the city were banned (high-rise building for example). Paris is an exception, and it reflects the remarkable commitment of the French government to avoiding centers of poverty in the national capital. 19 It was typical for poorer Parisians to rent floors on the upper levels of the homes of the rich.

VII. Conclusion
Traditional housing market explanations cannot explain all of the sorting of the poor into central cities. The income elasticity of demand for land is just too low. Instead, we find support for the views of Meyer, Kain and Wohl [25] and LeRoy and Sonstelie [23] that transportation-mode choice plays a key role in explaining income sorting.
None of this is meant to suggest that transport mode explains everything. First, the housing market surely matters. The centralization of older housing and in particular apartment buildings in central cities certainly helps explain the centralization of poverty (as in Brueckner and Rosenthal [6]). Any initial tendency of the poor to centralize which is the result of housing and transportation has surely been exacerbated by the social and political consequences of poverty. The marginal well-to-do suburbanite assuredly thinks more about school and crime than driving. We do not mean to minimize these forces, but rather suggest that these are, in many cases, outcomes that reflect an initial tendency of the poor to locate in central cities, and we join LeRoy and Sonstelie [23] in thinking that public transportation plays a major role in initializing this process.   The sample in columns (1) and (2) (3) and (4), the sample includes all people sampled in the 2001 National Household Transportation Survey who live in a zipcode that is within ten miles of one of the 109 metro areas. size for people who live in single detached dwellings. For apartment dwellers the dependent variable is the log of (unit's interior square footage*1.5/(floors in their building)). In specifications (2), (3), and (5) the demographic controls include the head of household's age, race, number of people in the household and whether children are present. In specifications (3), (5) and (7), head of household's education is used as an instrumental variable for income. The data source is the 2003 American Housing Survey. The unit of analysis is a household.
Notes: Numbers in parentheses are standard errors. The dependent variable in specifications (1)-(3) is the log of lot size for     (1), we estimate the regression for commuters who live within 3 miles from work. MSA fixed effects are included in each specification. For specifications (2)-(4), the sample includes all workers who live within 10 miles of where they work. Numbers in parentheses are standard errors. In each regression, the unit of analysis is a census tract. The data are from the year 2000. Subway City is a dummy variable that equals one if the tract's metropolitan area is Boston, Chicago, New York City or Philadelphia. In columns (1 and 2), the dependent variable is the log of a census tract's median household income. In column (1), the Car Zone is defined as the set of census tracts located between 5 and 15 miles from the CBD in metropolitan areas in which the highest public transit use in that mileage range is less than or equal to 2.5%. In Columns (2-3), the sample includes all tracts within ten miles of the CBD. All of the regressions are population weighted. The explanatory variables are linear splines thus the coefficient estimate represents the marginal effect in the stated mileage range. In columns (2) and (3) the omitted cateogry is a city that does not have a subway system.