Discovering Diverse Mechanisms of Migration: The Mexico–US Stream 1970–2000

Migrants to the United States are a diverse population. This diversity, identified in various migration theories, is overlooked in empirical applications that describe a typical narrative for an average migrant. Using the Mexican Migration Project data from about 17,000 first-time migrants from Mexico to the US between 1970 and 2000, this study employs cluster analysis to identify four types of migrants with distinct configurations of characteristics. Each migrant type corresponds to a specific theoretical account and becomes prevalent in a specific period, depending on economic, social, and political conditions in Mexico and the US. Around the period when each migrant type becomes prevalent, a corresponding theory is also developed.

Scholars have identified diverse mechanisms that lead individuals to migrate. These mechanisms are captured in various migration theories developed in multiple disciplines. In neo-classical economics, higher wages in destination propel the migration of individuals who expect to earn more there (Harris and Todaro 1970). In new economics of migration, the uncertainty in the origin economy leads to migration from households that face risks to domestic earnings (Stark and Bloom 1985). In cumulative causation theory, the growing web of social ties between origin and destination fosters the migration of individuals who are connected to prior migrants (Massey 1990a).
In a seminal series of publications, Massey et al. (1993Massey et al. ( , 1998 argued that the various causal configurations, implied by different theories, are not mutually exclusive. Incomemaximizing migrants can co-exist alongside migrants who seek to diversify risks, or those who join family or friends in destination. Massey and Espinosa (1997) provided the first empirical application of this argument in the Mexico-U.S. setting. Associating each theory to a set of independent variables, the authors used regression analysis to compare which variables, and theories, best predict who migrates. This empirical approach, although commendable in combining various theories, did not fully reflect Massey et al.'s (1993) vision, as it treated theories as competing rather than complementary accounts of migration. The approach also did not consider the conditional nature of theories, that is, the fact that each theory applies to a specific group of individuals under specific conditions. Indeed, in recent work, Massey and Taylor (2004: 383) critiqued their earlier work (Massey et al. 1998) for not being able to "state with any precision which theories were most important empirically in accounting for variations in the number, rate, and characteristics of immigrants over time and whether and why different theories may prove more or less efficacious in accounting for immigration patterns in different times and places," and identified the major challenge for migration research to be "test [ing] various theoretical explanations comparatively…to determine which ones prevail under what circumstances and why." This study considers the challenge of characterizing the causal heterogeneity of migration, thus takes on a major methodological problem in social science: identifying the different mechanisms that work for different groups of individuals. Quantitative social inquiry often focuses on (and generalizes from) an average case rather than studying the variability across cases (Duncan 1982;Xie 2007) in an effort to emulate the natural sciences (Lieberson and Lynn 2002). In recent years, however, new methods, such as multi-level, latent class, or growth curve models, have allowed researchers to study the variability in outcomes across different contexts, groups or trajectories (Raudenbush and Bryk 1986;D'Unger et al. 1998;Muthén and Muthén 2000).
Migration research has closely followed these developments. Studies have used split samples, interaction terms or hierarchical models to show the different factors influencing migration for men and women, among different ethnic groups, or in different contexts and time periods (e.g., Kanaiaupuni 2000;Marcelli and Cornelius 2001;Massey, Goldring, and Durand 1994). But these studies all relied on a few fixed categories, such as gender or community, to characterize the heterogeneity in migration, an approach that can be considered restrictive, even essentialist (Somers 1994).
In this study, rather than dissecting or modeling data based on a few selected attributes, we seek to discover the configurations of various attributes that characterize different migrant types. This approach is inspired by Ragin and Abbott's work in sociology. Ragin (1987) argued that there may be multiple causal bundles that lead to the same social or historical outcome, and these bundles may include various conditions that come together. To discover these causal bundles, he developed Boolean algebra and fuzzy set methods (Ragin 2000). Abbot (2001) similarly defined causes as specific configurations or sequences of events, and applied sequence analysis, a method originally developed for classifying DNA patterns, to social data (Abbott and Hrycak 1990).
Similar to Ragin and Abbot, we argue that different configurations of causal factors may lead individuals to the same outcome -to migrate from Mexico to the United States. To discover these configurations, we employ cluster analysis, an inductive and data-driven method for locating groups of cases with similar attributes. This method allows us to identify distinct types among migrants, thus characterize variation across cases, rather than focusing on an average case. Hence, instead of asking "What factors determine who migrates?" we can now ask "Are there different types of migrants in different contexts? Are these types captured in different theories?" Identifying configurations that characterize 'ideal' types has a long tradition (Weber [1922(Weber [ ] 1978. But, today, this tradition survives mostly in qualitative work. By using cluster analysis to discover different migrant types, we appropriate a quantitative method for a distinctly qualitative approach to social science. We then relate each migrant type to a theoretical narrative and offer an alternative way of linking evidence to theory, where different narratives provide complementary, rather than competing, accounts of migration. Finally, we juxtapose the temporal distribution of each migrant type against the major trends in the economic and political context of Mexico-U.S. migration and identify when, under what conditions and for whom each theory is most relevant, a strategy closely resonating with the philosophy of small-N case studies.
This approach provides novel insights to understand the migration stream between Mexico and the United States, the largest contemporary flow in the world. The study period begins in 1970 and captures various important changes in the migration context until 2000: economic fluctuations in Mexico leading to more migration, U.S. migration policy shifting to prevent it, and still growing undocumented migration between the two countries. We use the Mexican Migration Project data from about 17,000 migrants on the year of their first migration to the United States. Our analysis applies the K-means clustering algorithm with various validation checks, and yields four distinct migrant types. Each migrant type displays a distinct configuration of individual, household and community characteristics, and corresponds to a specific theory of migration. Furthermore, each migrant type becomes prevalent in a specific period, depending on the economic, social and political conditions in the two countries.
Strikingly, each migrant type also becomes prevalent around the period in which the theory it corresponds to is developed.

Background The Origins of Migration
Today 200 million people, roughly 3 percent of the world population, reside in a country other than the one they were born in (World Bank 2009). The increasing mobility of people, mainly for labor, has led to a rapid growth in migration research in the past four decades. This research has sharpened our understanding of the migration process, but also led to a fragmented set of theories developed in multiple disciplines (Menjívar 2010).
In neoclassical economics, labor migration is viewed as a product of wage and employment differentials between regions (Harris and Todaro 1970;Sjaastad 1962). Individuals from a low-wage origin seek to maximize their income by migrating to a high-wage destination (Todaro 1969(Todaro , 1977. The most likely migrants are individuals whose education and occupation permit higher earnings in destination compared to the origin. These predictions have received substantial empirical support. At the aggregate level, for example, researchers related Mexico-U.S. migration rates to wage and employment figures in both countries Frisbie 1975;Jenkins 1977;White et al. 1990). At the individual level, researchers showed that the expected earnings in destination determined whether an individual migrates from Mexico (Massey and Espinosa 1997;Taylor 1987), El Salvador (Funkhouser 1992), and Paraguay (Parrado and Cerrutti 2003).
The new economics perspective views labor migration as a household act to tackle the economic uncertainty in developing countries (Stark and Bloom 1985;Stark, Taylor, and Yitzhaki 1986). Given insufficient markets for insurance, households send migrants as a risk diversification strategy, where earnings in destination provide a hedge against shocks to domestic income (Stark 1984;Stark and Levhari 1982). As a result, migrants typically originate from households with substantial economic resources, a pattern observed in various settings including Mexico , Dominican Republic (Grasmuck and Pessar 1991), and the Philippines (Root and De Jong 1991). An alternative formulation of this theory considers credit market failures in developing economies. In that case, households send migrants to overcome capital constraints and to decrease their relative deprivation in the origin community (Stark andTaylor 1989, 1991;Stark and Yitzhaki 1988). This formulation is the culmination of earlier findings from case studies, which established migration as a strategy for supporting local farm or business activities (Cornelius 1978;Roberts 1982;Wiest 1973), as well as recent results, which showed that migrants' earnings are often invested in the origin community (Durand et al. 1996;Lindstrom and Lauster 2001;Massey and Parrado 1994).
The neoclassical and new economics perspectives both focus on the economic conditions that initiate labor migration. Cumulative causation theory shifts this focus to the social structure that sustains it (Massey 1990a(Massey , 1990b. In this theory, past migration develops a growing web of social ties between origin and destination regions. These ties increase the likelihood of future movement by lowering the costs and increasing the benefits of migrating (Massey and García-España 1987). The most likely migrants are individuals who have family or community ties to prior migrants in destination. Strong evidence confirms this expectation in Mexico (Davis and Winters 2001;Massey and Espinosa 1997;Massey and Zenteno 1999;Winters et al. 2001) and Thailand (Curran et al. 2005;Garip 2008). 1 There are two other theories that make predictions about aggregate migration flows, but not about the specific characteristics of migrants, hence are not elaborated in this study.
Segmented labor markets theory attributes migration to the labor demand inherent in industrialized economies (Piore 1979). Migrants fill the unskilled jobs that are undesirable to the native workers due to low wages and status. In world systems theory, migration stems from the expansion of capitalist economies into developing countries (Wallerstein 1974). Migrants seek livelihoods abroad as a response to the economic disruptions in their own countries, and by capitalizing on their increasing cultural connections to developed regions due to globalization (Castells 1989;Sassen 1988Sassen ,1991.

A Gap between Theory and Evidence
This study focuses on three theories that predict different types of migrants mobilized for different reasons. Neoclassical economics anticipates income-maximizing migrants who expect to earn higher wages in destination. New economics predicts risk-diversifying migrants who seek to complement earnings at risk in origin. Cumulative causation describes network migrants who follow family or friends in destination.
Each theory depicts a unique facet of the migration process, and combined together, they provide a more complete picture. Considering these complementarities, Massey et al. (1993Massey et al. ( , 1998 took on a massive effort to integrate various theories of international migration. These theories, the authors argued, carry distinct implications that need to be integrated in a common analytic framework and evaluated empirically. Massey and Espinosa (1997), in their comprehensive analysis of the Mexico-U.S. case, provided the first empirical application. The authors first identified variables that captured the predictions of various theories. The inflation rate in Mexico, for example, measured the level of economic uncertainty, a catalyst for migration in new economics theory. The prevalence of migration in origin community signified the density of connections to prior migrants, an important factor leading to migration according to cumulative causation theory. Using a regression model, and 41 such variables, the authors then evaluated which variables better predict who migrates in 25 Mexican communities over 25 years. The variables corresponding to the new economics and cumulative causation theories obtained substantively meaningful and statistically significant coefficients. These theories, the authors argued, received strong empirical support. The variables capturing neoclassical, segmented markets and world systems perspectives had less conclusive coefficients leading to weak support for those theories.
This empirical approach, based on regression analysis, creates a gap between theory and evidence on migration. First, the approach juxtaposes theories against one another as competing explanations of migration, not fully reflecting Massey et al.'s (1993) vision for these theories as complementary accounts. Second, the approach produces average results that are presumed to generalize to all individuals and across time. These results imply that migration theories, conditional statements in reality, apply universally within the scope conditions of the study.
In recent years, migration scholars have made strides in addressing this issue of population heterogeneity, that is, the fact that different mechanisms may work for specific groups of cases. Gender scholars, for example, have shown the different reasons underlying the migration of men and women (Cerrutti and Massey 2001;Curran and Rivero-Fuentes 2003;Donato 1993;Hagan 1998;Hondagneu-Sotelo 1994;Kanaiaupuni 2000;Pessar 1999). Students of assimilation have demonstrated different patterns of integration to the host society among migrants from different ethnic groups (Alba and Nee 1997;Portes and Rumbaut 1996;Portes and Zhou 1993). Others have studied the varying causes of migration over time or across communities (Durand et al. 2001;Fussell and Massey 2004;Lindstrom and Lauster 2001;Marcelli and Cornelius 2001;).
This study builds on these efforts, but proposes a novel approach to characterize the causal heterogeneity in migration. Rather than dissecting data based on a few selected attributes, this approach employs cluster analysis to discover the distinct configurations of causal factors that characterize different migrant types.

Major Milestones Since 1942
This study focuses on the migration from Mexico to the United States between 1970 and 2000.
This flow, the largest in the world today gained steam with the Bracero program, which recruited 4.6 million Mexican workers to the United States for short-term farm labor from 1942 through 1964 (Cornelius 1978). The end of the Bracero program marked a shift in the U.S. immigration policy. The changes to the Immigration and Nationality Act in 1965 and 1976 severely limited the number of visas available to Mexicans. This condition, combined with the economic downturn in Mexico brought on by two peso devaluations in 1976 and 1982, set off an influx of undocumented migrants to the United States. From 1965 to 1986, about 5.7 million Mexican migrants entered the country, 80 percent of whom were undocumented (Massey et al. 2003).
This period of mostly unhindered, undocumented migration ended with the Immigration Reform and Control Act (IRCA) in 1986, which increased border enforcement and imposed sanctions on employers hiring undocumented migrants. The legislation also granted amnesty to 2.3 million undocumented Mexican migrants (U.S. INS 1990). As an unintended consequence, the amnesty created incentives for the relatives of the newly legalized Mexicans to also migrate (Massey and Espinosa 1997). Undocumented migration to the United States continued as a result through the 1980s, considered the 'lost decade' for Mexico's economy (Sheahan 1991).
In 1994, two important events, the peso devaluation in Mexico and the North American Free Trade Agreement (NAFTA) between Mexico, United States and Canada, contributed to increasing migration flows to the United States. The former led to the worst economic crisis in Mexico in decades, and the latter displaced rural farmers through deregulation in agriculture. As a result, from 1994 to 1998, U.S. border apprehensions rose from 1.1 to 1.7 million (Martin 2003). By 2000, the Mexican-born persons in the United States had reached 8.4 million, of whom 3.9 million were estimated to be undocumented (Bean et al. 2001).

Study Data
The majority of quantitative results on Mexico-U.S. migration are based on data from two surveys: the Mexican National Survey of Population Dynamics (ENADID) and the Mexican Migration Project (MMP). 2 The former is a representative national sample, but contains information on only labor migrants. The latter is from specific Mexican communities, but covers all migrants, including those who have moved to the United States to join family members.
The inclusion of all migrants, not just labor-force participants, makes the MMP data more advantageous to study the diversity of the Mexico-U.S. stream. These data are not strictly representative of the Mexican population. Yet, prior work found that the MMP data yield an accurate profile of the U.S. migrants in Mexico, and this profile is largely consistent with that observed in the ENADID data (Durand et al. 2001;Zenteno and Massey 1998).
The MMP data come from 124 communities located in major migrant-sending areas in 21 Mexican states. Each community was surveyed once between 1987 and 2008, during December and January, when the U.S. migrants are mostly likely to visit their families in Mexico. In each community, individuals (or informants for absent individuals) from about 200 randomly selected households were asked to provide demographic and economic information and to state the timing of their first and last trip to the United States. Household heads were additionally asked to report the trips in between. These data were supplemented with information from a non-random sample of migrants identified with snowball sampling in the United States (about 10% of the sample).
Because more detailed information is available for household heads, most studies of the MMP have restricted attention to this sub-population. To provide a more representative portrait of migrants, this study considers all household members. The analysis seeks to identify the diversity in the attributes of migrants on their first trip to the United States. Subsequent trips are not considered as they are recorded only for household heads, and also to avoid a complication that has haunted prior work on migration. This complication arises from the fact that many attributes related to migration behavior are also changed by it. Over successive trips, migrants gradually gain more experience, establish stronger ties to destination, and become wealthier.
Their attributes change, not as a result of the changing selectivity of the stream, but due to the changes caused by prior migration trips. Focusing on first-time migrants allows us to observe migrants' attributes independently from this reciprocal relationship.
A concern with the MMP data is the retrospective nature of the information on migrants.
Let's take a household surveyed in 1990, where the daughter has migrated to the United States for the first time in 1980. Her attributes, like age and education, were recorded in 1990, but could be projected linearly to 1980. The economic status of her household could be reconstructed using the data on the timing of asset purchases. The characteristics of her community could be traced back using the retrospective community history. All these plausible steps rely on one crucial assumption: that the daughter in question was living in the same household and community in 1980. While this assumption is viable for most cases, the study cannot account for the cases for which it is not.

Cluster Analysis vs. Regression Analysis
Cluster analysis is a method for discovering groups with similar attributes in data. This method is widely used in fields as diverse as biology, physics and computer science to produce effective descriptions of typically large and complex data sets. Yet, in the social sciences, the method has been overshadowed by the overwhelming popularity of regression analysis.
Regression analysis estimates parameters that characterize a relationship between an outcome and several attributes. These parameters capture causal effects if the researcher can credibly account for the unobserved heterogeneity in data. The causal effects, if expected constant over time, may lead to reliable outcome predictions.
Cluster analysis produces a very different output. Rather than search for associations with an outcome, the method discovers groups in data based on the variability in several attributes.
The results, although purely descriptive in essence, may show useful associations to outcomes of interest. For example, different groups of migrants from Mexico may display different settlement and assimilation patterns in the United States.
The two methods also assume different data structures. Regression methods envision a uniform distribution of cases over the attribute space. Yet, in most social data the attributes are correlated and the cases cluster around a few distinct configurations (Abbott 2001;Ragin 1987).
Regression methods can take into account these configurations by introducing interactions between attributes. But the number of possible interactions increases exponentially with the number of attributes and renders the model quickly unmanageable. Cluster analysis is a more efficient method for identifying the observed configurations of attributes.
Clustering and regression methods present different approaches to learning from data.
The usefulness of either approach depends on the questions of interest, as well as the structure of data. This study seeks to discover distinct types of migrants based on various attributes in the MMP data. Qualitative studies suggest the presence of distinct groups among Mexico-U.S. migrants (Portes and Rumbaut 2006), and quantitative analysis shows significant interactions among attributes in relation to migration behavior (Curran and Rivero-Fuentes 2003). Both the question of interest and the suspected structure of data point to cluster analysis as the method of choice. (Other related methods include latent class and growth curve models. The former focus on the variability in outcomes across unknown latent groups, the latter identify the variability across trajectories. Neither is appropriate for our purpose, which is to group cases based on configurations of causal factors (not outcomes), while keeping the outcome constant.)

Steps in Cluster Analysis
Choosing the Relevant Attributes The first step in cluster analysis is selecting the attributes for partitioning the data. This process, similar to variable selection in regression analysis, involves either examining the data or relying on theories to identify salient attributes. This study exploits the vast empirical work on the MMP data and uses several attributes that have been shown to shape migration behavior (e.g., in Massey and Espinosa 1997).
The attributes, listed in Table 1, include individuals' demographic characteristics (whether they are household heads and/or male, years of education and occupation), household wealth (properties, land and businesses owned), prior migration experience (whether they migrated in Mexico, number of U.S. migrants and residents in household, and proportion of individuals who have ever migrated in their community) and community characteristics (proportion working in agriculture, proportion self-employed, proportion earning less than the minimum wage and whether the community is in a metropolitan area).

[TABLE 1 ABOUT HERE]
The average values for these attributes differ significantly (p<0.05, two-tailed t-test) for migrants and non-migrants. Migrants are individuals who have migrated at least once and non-migrants are those who have never migrated. For the sake of comparison, both groups are observed on the survey year in each community. (In subsequent cluster analysis, migrants are observed on the year of their first U.S. trip.) Compared to non-migrants, migrants are more likely to be household heads and male, to have higher levels of education, and to work in agriculture, manufacturing or service occupations, rather than being unemployed. They live in wealthier households with ties to U.S. migrants, and in poor and rural communities that contain a high proportion of self-employed individuals and agricultural workers.
Similar to the evidence in prior work, the significant differences between migrants and non-migrants observed here establish the relevance of the selected attributes for migration. Also relevant for migration, but not included in cluster analysis, are indicators that capture important economic or policy events, like the soaring Mexican inflation or interest rates in the 1980s or the passage of IRCA in 1986. These events introduce external shocks to the migration system, and typically shift the magnitude or composition of the migrant stream. Hence, they provide a perfect opportunity to evaluate the migrant clusters, which, if substantively valid, should display a temporal pattern reflecting these shifts. We explore this connection in later analyses.
The selected attributes in this study are measured on different scales. About half are binary (e.g., gender, occupation), a few are counts (e.g., number of properties or years of education), and the rest are continuous. Clustering methods are typically sensitive to scaling of attributes, which determines the importance assigned to a particular attribute. To avoid an arbitrary weighting of attributes, we dichotomize each non-binary attribute such that the values above the median are converted to 1 and those below it to 0.
This strategy standardizes the range of attributes, and has shown superior performance in prior studies compared to other scaling methods that standardize the variance of attributes (Milligan and Cooper 1988). Similar to past work, we find that the attributes standardized to the same scale (but not the same variance) lead to the most well-separated and substantively meaningful clustering solution in the MMP data (comparisons available upon request). 3 Choosing an Algorithm Clustering algorithms use a set of attributes to divide the data into a given number of groups (or "clusters") so that the cases in a group are as much alike as possible.
The output is typically a cluster membership for each case and a centroid for each cluster that represents the "mean" (or average) of the cases in that cluster. This study employs the popular Kmeans method, a classical clustering algorithm that iterates between computing K cluster centroids by minimizing the within cluster variance and updating cluster memberships (Hastie, Tibshirani, and Friedman 2009).
The K-means method makes no assumptions about the data structure and thus has been generically applied to a diverse set of problems. Alternative methods typically assume a hierarchical clustering structure or rely on a probabilistic model of the data. The former (hierarchical) approach is useful if such a structure is substantively expected (e.g., evolutionary trees in biology), which is not the case in this study. The latter (model-based) approach is advantageous if the data conform to a probabilistic model, and has proven useful in lowdimensional data sets. Yet, in our experience, the available software implementations of the model-based approach have poor performance with large and high-dimensional data sets like the MMP. For substantive and practical reasons, this study uses the K-means algorithm implemented in Matlab(R) software (Matlab 2010, version 7.6). This algorithm, in addition to being generic and fast, is in fact equivalent to the model-based approach for certain probabilistic models of the data.
Choosing a Similarity Measure Any clustering algorithm relies on a measure of similarity, or dissimilarity, to assess how 'close' cases are to one another in the attribute space. In fact, choosing this measure is far more consequential for discovering the clustering structure in data than specifying the algorithm itself (Hastie et al. 2009). Although there are no generic guidelines, researchers typically base their decisions on the nature of the data and the substance of the question.
This study uses the city block distance to assess how close migrants are in various demographic, economic and social attributes. This measure aptly deals with binary data and reflects our substantive preference to treat two individuals who share a trait (e.g., low education) as equally similar to one another as two individuals who both lack the trait. For every pair of individuals i and j, the city block distance, Based on these results, and a preference for parsimony, we choose the 4-cluster solution, which is optimum for two measures and reasonable for the remaining four. This broad agreement across various measures is actually rare in clustering applications and increases our confidence in the validity of the results.

Assessing the Validity of Results
Another useful way to assess the clustering results is to draw a cluster heat map. Imagine each individual is represented by a vertical column of rectangles, where each rectangle corresponds to an attribute. A gray rectangle denotes the presence of an attribute, and a white one shows its absence. If we stack the columns for all individuals side by side, while keeping the individuals in the same cluster together, we end up with a heat map, an ingenious display of the entire data matrix (17 attributes x 17,049 individuals) along with the cluster structure. Figure 3 shows the heat map for the MMP data generated by the heatplus package in R. The rows show the attributes that are ordered so that the correlated attributes are close to one another. The columns represent the migrant individuals. The vertical black lines separate the four clusters.

[FIGURE 3 ABOUT HERE]
Each cluster contains migrants on their first trip to the United States, but with visibly distinct characteristics. Migrants in cluster 1 are mostly male household heads; those in cluster 2 typically own many assets. Both groups live in poor rural communities. Migrants in cluster 3 are mostly females and live in households or communities with former U.S. migrants. Those in cluster 4 are relatively educated and live in urban communities.
Several attributes in the heat map are highly correlated with one another. Communities with a high number of poor individuals also have high levels of self or agricultural employment.
Households with former U.S. migrants are typically located in communities with high levels of migration. Individuals with a high level of education are likely to live in urban communities. It is precisely due to these correlations that our data fall into distinct groups, providing a fertile ground for cluster analysis.

Interpreting the Clusters
The four columns in Table 2 present the mean values of attributes in each of the four clusters.
The last two rows show the number and proportion of migrants in each cluster, which appear to be relatively uniform. The attributes are measured on migrants' first trip to the United States. For each attribute, the highest cluster mean is shown in boldface and differs significantly (p< 0.05, two-tailed test) from the value closest to it in all cases but one (U.S. migrants in the household).
We interpret these values in light of migration theories and label each cluster as a specific migrant type.
[ A characteristic (or an ideal-type) migrant in this cluster is a male household head who has no education and, hence, no access to lucrative jobs in the local labor market. He lacks income-generating assets, like land or a business, and lives in a poor rural community with limited opportunities. Given his meager economic prospects at home, we posit that this person migrates primarily to increase his income, and acts in line with a prediction of the neoclassical economics. To reflect this correspondence, which we will support with circumstantial evidence in subsequent analysis, we label this migrant, and the group he represents, as an 'income maximizer.' The average income maximizer lacks the social ties to facilitate an international move, and hence, he may migrate in Mexico first to raise the funds, or acquire the experience, necessary for a U.S. trip.
The second cluster consists of the wealthiest migrants in the sample. 76% of these migrants own a property, 38% own some land, and 16% own a business.  , n.i.). Few of them own any assets. About one in three owns a property, one in five owns some land, and only one in ten owns a business.
Compared to the first two clusters, a lower share of them (15%) live in poor communities, but a higher share (34%) are located in metropolitan areas.
A typical migrant in this group is the daughter of the household head who is unemployed.
At least one member of her household, probably her father or husband, is a current or prior U.S. migrant. Given that she is not economically active, but connected to other migrants, we posit that this person migrates to join her family members in destination and label her as a 'network migrant.' Network migrants, those that follow social ties rather than economic incentives, are a crucial component of cumulative causation theory, which predicts migration flows that are progressively independent of the economic conditions that initiate them. We expect, and later show, that network migrants become especially prevalent when family reunification policies are in place in the United States.
The fourth cluster contains the highest percentage of educated migrants, who mostly work in manufacturing (39%) and overwhelmingly live in urban metropolitan areas (81%). Most of them are male (80%) and twice as likely to be the adult children (60%) rather than the heads (31%) in their households. About one-third of these migrants have started, and about one-fifth have finished secondary education. 67% of migrants in this cluster own a property, and 14% own a business, the second highest share across all clusters. About a third of these migrants have family ties to U.S. migrants, and also a third live in communities with high migration prevalence.
Only a small share of them live in communities with high agriculture (12%), or self employment (10%), or in communities with a high share of low-wage earners (13%).
The representative migrant in this cluster is the son of the household head who has some secondary education and lives in an urban community. Given his education and place of residence, this migrant has access to more and better job opportunities than a typical migrant in the other clusters. He owns a property, which provides him with economic security, but lacks risky assets like land or business. He does not have any prior migrants in his family, and does not live in a traditionally migrant-sending community. Based on this configuration, which is not anticipated in any individual-level migration theory, we call this person an 'urban migrant' to underline one of his most distinguishing and surprising characteristics.
In the remainder of the paper, we first evaluate the temporal patterns in the prevalence of the four migrant types. We then consider the important economic and policy trends in the study period in order to, first, justify the labels we have attached to the migrant types, and second, to expose the contextual prerequisites for the emergence or dominance of those types.

Exploring Temporal Patterns
We identified the four migrant types based on migrants' own, household and community characteristics on their first trip to the United States. In this process, we included migrants observed at different time points into a single cluster analysis and deliberately excluded indicators for economic or policy trends that capture the Mexico-U.S. migration context. Despite the exclusion of these trends, we still obtained results that show a strong temporal pattern.

[FIGURE 4 ABOUT HERE]
The four panels in Figure 4 show the percentage of migrants in each migrant type over time. (We focus on percentages rather than total numbers to account for the varying sample sizes  Table 2. If each group is prevalent in a different period, then the differences between groups in attributes like education or urban origin may not signal inherent divisions, as we assumed, but instead reflect general trends in Mexico, like rising education levels or increasing urbanization. Put differently, an urban migrant may have higher education than an income maximizer, not because he represents a different migrant type, but because he is observed at a later period when the education levels are generally higher in Mexico. We investigate this possibility for two attributes, education and urban origin, that are most likely to change in the Mexican population over time. We find that, for each migrant type, recent cohorts have higher education than earlier cohorts. An average income maximizer has 4.7 years of education in the 1970s, which increases to 6.5 years in the 1980s and to 6.9 years in the 1990s. An average urban migrant, by contrast, has 5.9 years of education in the 1970s, 7.8 years in the 1980s and 8.3 years in the 1990s. Although the level of education is rising consistently for both migrant groups over time, the difference between the two groups varies tightly around 1.2 years and remains significant (p<0.05, two-tailed test) in each period.
A similar analysis reveals that migrants in more recent cohorts live in larger communities than those who left earlier. An average migrant comes from a community of 95 thousand inhabitants in the 1990s, compared to 52 thousand in the 1980s and 40 thousand in the 1970s.
Despite this general trend, which is due to growth in population and urbanization in Mexico, the differences across groups show remarkable stability. In each period, urban migrants live in larger communities than network migrants, who in turn live in larger communities than income maximizers or risk diversifiers. Hence, while each migrant group displays the trends in the general population, it still retains its distinguishing character vis-à-vis the other groups. (These results are confirmed with regression analysis in a subsequent section.) In the following section, we identify the contextual conditions that lead different migrants groups to assume the majority in different periods, and hence suggest potential sources of the temporal variation in migrant profiles.

Bringing in the Context
From 1970 to 2000, a number of economic and policy trends characterized the Mexico-U.S. migration context. We discuss these trends chronologically below, and consider their connection to the prevalence of different migrant types in our data. In Figure 5, we juxtapose four of these trends against the prevalence paths for the four migrant types and detect consistent patterns of co-variation that we describe in detail below.
Starting in the 1960s, Mexico experienced a prolonged decline in agricultural productivity (Heath 1988;Martin 2003). This decline led to a shortage of job opportunities (Roberts et al. 1999) and the worsening of living standards for low-income families in rural regions (Reyes-Heroles 1983). Through the 1970s, the reductions in arable land and declining prices of agricultural products swept the country to a deep agricultural crisis (Papail and Arroyo 2004). The increasing mechanization of agriculture in this period contributed to further displacement of farm workers, most of whom migrated to internal or international destinations (Arroyo 1989;Durand and Massey 1992;Yates 1981). The workers that migrated to the United States filled farm jobs, which, following the Bracero program, had come to be defined as immigrant jobs and socially unacceptable to the U.S. citizens (Massey et al. 2003;Piore 1979).
In our data, the majority of migrants in the 1970s are poor and uneducated agricultural workers from rural communities. As the above description suggests, this group, labeled the income maximizers, is particularly strained by the economic conditions in Mexico at the time. In neoclassical economics theory, income maximizers are expected to migrate from a low-wage origin to a high-wage destination to increase their earnings. This proposition implies that the share of income maximizers in our sample should respond to changes in Mexican or U.S. wages.

[FIGURE 5 ABOUT HERE]
The upper-left panel of Figure  work ), the legalizations increased migration by women and dependent children for family reunification (Hondagneu-Sotelo 1994).
In our sample, network migrants, mostly women joining their families in the United States, although present throughout the study period, proliferate in the years following IRCA.
These migrants, mobilized by social ties rather than economic pressures as predicted by cumulative causation theory, become the second largest group in 1990, comprising about 30% of all migrants, a share they maintain through the decade.
This pattern is observed in the lower-left panel of Figure 5, which shows side by side the percentage of network migrants and the ratio of available visas to Mexican migrants. The two lines both spike in the same period immediately following IRCA. Although the ratio of visas drops after 1990, network migrants retain their level due to the higher incentives for the relatives of the newly legalized Mexicans to migrate as well, albeit without documents. The correlation between the two lines is modest (+0.28) because of the pent-up demand that led to a response that is highly skewed to the first years of the policy change, and because the ratio of visas is only related to network migrants with documents, not to those who are undocumented. Oaxacans, continued on to become international migrants to the United States (Zabin et al. 1993).
The Mexican economy, which appeared solid at the signing of NAFTA, experienced a severe economic crisis in December 1994. Following a peso devaluation, the country defaulted on its foreign debt, and within a year, saw its GDP shrink by 6% and its employment rate double (Meza 2006). Around the same time, the United States was in the midst of the longest sustained period of job growth in its history. The economic differentials between the two countries once again ensured the continued flow of migrants. Different than prior years, migrants in the post-NAFTA and post-crisis era included many educated professionals who were admitted for shortterm labor. From 1994 to 1997, the number of Mexicans admitted for temporary work (under the H visa program) tripled and reached 37,000 persons per year (Durand et al. 2001).
In our sample, the majority of migrants in the 1990s are relatively educated, work in them to resort to international migration as a coping strategy.
In a related line of thought, the rise of the urban stream can be explained by the globalization arguments, which predict increasing migration flows with growing economic, cultural and ideological linkages between countries (Sassen 1988(Sassen , 1991. The educated individuals in urban areas, the typical migrants making up the urban stream in our data, may be the first to respond to these linkages. If that is the case, the proportion of the urban stream should increase with increasing economic ties between Mexico and the United States, captured, for instance, by the trade flows. Such a link would also validate Hernández-León's hypothesis, as the periods of increasing economic ties to the U.S. (e.g., due to GATT and NAFTA) overlap with the periods of economic restructuring in Mexico.
The lower right-hand panel of Figure

Linking Empirical Patterns to Emergence of Theories
The temporal patterns suggest that each migrant type, corresponding to a distinct theoretical narrative, gains prevalence under specific economic, social and political conditions. Income maximizers, representing the neoclassical narrative, are most prominent in the 1970s when the U.S. wages are at their highest. Risk diversifiers, personifying the new economic theory, gain majority in the 1980s when the Mexican inflation rate is at its peak. Network migrants, symbolizing the cumulative causation theory, obtain their highest proportion in 1990s when visa availability is at its highest. (We do not include urban migrants in the following analysis as the specific configuration of this group is not predicted by any theoretical perspective, although its temporal patterning seems to correspond to general trends brought on by globalization (Sassen 1988) and Mexican economic restructuring (Hernández-León 2008).)

[FIGURE 6 ABOUT HERE]
Revealing a striking pattern, the temporal order of the prevalence of the three migrant types coincides with the temporal order of the emergence of theories on which these migrant types are based. The three panels in Figure 6 show the proportion of income maximizers, risk diversifiers and network migrants, respectively. The vertical lines in each panel indicate the timing of the three most-cited articles in three theoretical perspectives: neoclassical economics, new economics of migration and cumulative causation. 4 Following two initial articles by Sjaastad (1962) and Todaro (1969) Stark and Levhari (1982), Stark and Bloom (1985) and Stark, Taylor and Yitzaki (1986) published  (1964). 5 The cumulative causation theory follows from earlier work that linked social networks to chain migration (MacDonald and MacDonald 1964). But, because these theories capture the empirical trends of their time, even if incidentally, they are likely to be cited more than other work.

From Descriptive Results to Testable Hypotheses
The results so far yield a useful typology of migrants with a meaningful temporal pattern that is correlated with the trends in the economic, social and political context of Mexico-U.S. migration, as well as the temporal ordering of the prominent migration theories. Since these analyses only characterize the variation in migrants, below we raise and address three concerns that potentially challenge our interpretations.
[  Table 3  To further confirm this last point, we need to address a second concern. Because each migrant type dominates in a specific time period, the differences we observe among migrant groups may be capturing the shifts in the population composition over time, rather than the shifts in the mechanisms underlying migration. For example, the differences in the education levels of urban migrants and income maximizers may be attributed to the increasing education levels in the Mexican population, and not to an increasing importance of education for U.S. migration as we posit. The results in Table 3 partly address this concern. The different coefficient estimates of education for income maximizers and urban migrants suggest that these two groups do not just come from different pools, but they are selected differently from these pools. (The results are similar if we introduce year fixed-effects to control for the temporal change in migration, thus sacrificing the identification of the macroeconomic indicators.) To better capture the trends in population composition, we ran logistic regressions of first U.S. migration in the pooled sample using data from sliding 3-year windows. Figure 1970-1972 and 1982-1984 periods, but declines from 1985 to 1991 and increases thereafter. Similarly, the effect of secondary education on migration is negative or insignificant (depicted with white circles) prior to 1985, but increases sharply to positive values thereafter. These patterns suggest a changing selectivity of migrants that is independent from the changing characteristics of the population, and support our interpretation that the differences we observe among migrant groups reflect shifts in the mechanisms and incentives underlying migration, not just shifts in population composition.

[FIGURE 7 ABOUT HERE]
A third concern about the results is the extent of their usefulness beyond characterizing the heterogeneity in the sample on which they are based. Can we, for example, use the migrant types to develop testable hypotheses? Can we then discover meaningful associations between these types and post-migration outcomes? To answer these questions, we consider five outcomes that characterize migrants' experiences in the United States: (i) total number of U.S. trips, (ii) undocumented entry, (iii) receiving residency or citizenship, (iv) being unemployed, and (v) wages. We hypothesize that migrants will differ significantly in these outcomes based on their cluster membership.
Given their short-term economic goals, and the lower level of border enforcement at the time they predominantly migrate, income maximizers and risk diversifiers will make a higher number of total U.S. trips compared to network or urban migrants. For the same reasons, these two groups will also be more likely to cross the border without documents. Due to their eligibility to take advantage of family reunification policies, network migrants will have a higher likelihood of receiving U.S. residency or citizenship compared to the other three groups. This group, however, will also be more likely to be unemployed as it comprises mostly of women and children. Finally, urban migrants will command the highest wages of the four migrant groups given their high levels of education and experience in manufacturing occupations.
[ The results support all our hypotheses. Network and urban migrants make 73 (e -0.31 ) and 76 percent (e -0.27 ) percent less trips in total compared to income maximizers (model 1). These two groups are also about 70 percent less likely to enter the U.S. without documents (model 2).
Network migrants are almost 10 times more likely to obtain legal status compared to income maximizers (model 3), but about 7 times more likely to be unemployed during the first trip (model 4). Finally, urban migrants earn about 10 percent more than income maximizers, their closest follower in terms of wages.
The results show that the four migrant types differ not only in their characteristics, and the incentives underlying their migration, but also in their experiences in the United States. The proposed typology thus provides unique insights for extending the reach of existing theories and for designing policies that target the discovered migrant groups to encourage certain postmigration outcomes.

Conclusion
There are no dominant paradigms in population research. Most questions can be approached from a variety of theoretical perspectives. But in empirical applications, this diversity often gets lost. To use prevailing quantitative methods, like regression analysis, researchers frame their questions around average differences, for example, between persons who display a behavior and those who do not, and reduce theories to competing sets of independent variables. If the corresponding variables capture statistically significant differences between groups, a theory is accepted; otherwise it is rejected. This strategy, inevitably, leads to either-or theoretical stances, rather than an emphasis on the complementarity of varying theories.
This study proposes a strategy to capture the theoretical diversity in the migration literature. Instead of focusing on differences between groups who do and do not exhibit a particular behavior, the proposed strategy calls attention to variability within a group of individuals who display the same behavior or outcome. Are there different paths that brought them there? Are these paths captured by different theories?
The empirical approach involves cluster analysis, a method commonly used in dataintensive fields like biology, physics and computer science to identify subsets of cases with similar characteristics. In this novel application to social sciences, cluster analysis discovers distinct groups among individuals who share a behavior of interest, that is, migration. Each group is identified by a specific configuration of characteristics and the experience of each appears consistent with a specific theoretical account.
This approach provides a new perspective to understand the migrant stream between Mexico and the United States. This stream, the largest in the world today, continuously increased in the past decades leading to a Mexican-born population of 8.4 million by 2000 (Bean et al., 2001). During this period, the economic, social and political conditions in the two countries changed drastically. These changes also shaped the character of the migrant stream, leading to a Mexican population that is diverse in backgrounds and objectives in the United States. This diversity, captured in a number of theories developed in economics and sociology, is overlooked in quantitative applications that focus on describing a typical narrative for an average migrant.
Applying cluster analysis to the Mexican Migration Project (MMP) data, from about 17,000 first-time migrants over a 30-year period from 1970 to 2000, this study identified four distinct types of migrants based on individual, household and origin community characteristics.
These types corresponded to specific theoretical accounts and gained prevalence at specific time periods depending on the economic, social and political conditions in both countries.
Earlier migrants consisted mainly of male household heads from rural areas with little education and few assets, who sought to increase their earnings by moving to the United States.
Labeled as income-maximizers, these migrants embodied the predictions of the neoclassical economics theory. They remained the dominant migrant type when the U.S. wages were at their highest value in early 1970s, and slowly declined in number as the wages declined in real value.
In early 1980s, another migrant type, which we call risk-diversifiers, dominated the Mexico-U.S. stream. These migrants came from households with substantial assets, but were not household heads, and lived in communities where the majority of households were selfemployed. As predicted by the new economics of migration theory, they migrated to the United States to secure earnings that insure against risks to household assets. These migrants reached high numbers when the Mexican inflation rate, a proxy for economic uncertainty, soared from the early to late 1980s. As the inflation rate returned back to normal in 1990s, risk-diversifiers also declined in numbers.
From the mid-1980s to early 1990s, network migrants became the majority among the first-time Mexicans migrants to the United States. These migrants, mostly women with family or community ties to prior U.S. migrants, exemplified a prediction of cumulative causation theory: past migration creates social ties to destination, which facilitate more migration. Network migrants remained constant in proportion, making up about one-fifth of all migrants, until the mid-1980s. In 1986, the Immigration and Reform Act (IRCA) legalized 2.3 million undocumented migrants and increased the number of visas available to Mexicans. As a result, network migrants doubled in proportion, and remained at that level until 2000.
Starting in the mid-1990s, a new migrant type quickly gained prevalence. These migrants, which we call the urban stream, were highly educated, worked mostly in manufacturing and lived in metropolitan areas. Constituting the majority of migrants in the 1990s, the urban stream was not predicted by any individual-level theory. Yet, a general trend of increasing economic connectivity to the United States, and a resulting economic restructuring in Mexico, may explain the rise of this group in our data. Given that the urban stream increased in proportion following the increased trade between Mexico and the United States after GATT and NAFTA, which also implied major structural changes in the Mexican economy, these hypotheses remain plausible, but need to be evaluated in future work.
Revealing a striking pattern, each migrant type became dominant around the time in which its corresponding theory was developed. Income-maximizing migrants prevailed in 1970s when Harris and Todaro (1970) published the defining article of the neoclassical economics perspective on migration. Risk-diversifiers become the majority in mid 1980s when Stark and Bloom (1985) published the most influential article on the new economics of labor migration.
Network migrants gained prevalence in the early 1990s when Massey (1990a) developed the cumulative causation theory of migration. This unanticipated finding suggests a relationship between empirical patterns and the scientific ideas that try to capture them, and begs further study by the sociologists of knowledge.
The empirical patterns identified in this study showed the heterogeneity in the migration process across individuals and over time periods. The patterns suggested that different causal regimes may govern specific groups of individuals or specific periods. These causal regimes were scrutinized with regression analysis on sub-samples that included each migrant group and non-migrants. The results showed that a different set of factors mobilized each group, a variation that remained obscure in conventional regression analysis run on pooled samples. The results thus suggested cluster analysis as a potential solution to the sample-splitting or change-point problem in statistical analysis, concerned with identifying sub-samples for which the regression estimates are stable. A final set of analyses demonstrated that migrant groups do not only vary in the reasons that mobilize them, but also in their experiences in the United States, a finding that confirmed the potential of the proposed typology for yielding testable hypotheses and extending existing theories.
The methodology applied here could be used to investigate any research question where there is heterogeneity in the causal mechanisms leading to a given outcome, and where that heterogeneity can be characterized by identifying groups that vary with respect to key factors suggested in competing theoretical accounts. Recent studies have employed similar methods to categorize individuals into alternative theories of popular nationalism (Bonikowski 2012) and musical tastes (Goldberg 2012), or to group legislative voting behavior into rival theories of trade policy preferences (Imai and Tingley 2012).
Contributing to this burgeoning literature, the present application of cluster analysis allowed us to combine various theoretical perspectives on migration and to embrace the diversity of migrants. By searching for groups of individuals who share the same behavior but differ on configurations of characteristics, cluster analysis revealed the various mechanisms that apply to each group. This approach, although quantitative in method, is qualitative and historical in spirit.
It is in solidarity with the case-oriented approach proposed by Ragin (1987), which seeks to identify 'constellations, configurations and conjunctures' that define and distinguish each case.
The approach is also similar to a 'colligation' process, which involves piecing together various factors to explain a case, imported from history to sociology by Abbott (2001). The goal, similar to these authors', is to close the gap between theory and empirical evidence, and between qualitative and quantitative methods in the social sciences.
1 Most studies have focused on the social ties to migrants as the principal mechanism of cumulative causation. But, research has also identified other factors, such as the regional distribution of human capital, the organization of agriculture or culture, that might be affected by -and eventually affect -migration in a cumulative fashion (Massey et al. 1993). These factors are difficult to assess reliably with the survey data at hand, and thus, are not discussed at length here. 2 Detailed information on the MMP is available at: http://mmp.opr.princeton.edu/. 3 In classical statistical estimation, converting continuous variables to binary attributes would lead to a severe information loss. In cluster analysis, this approach is not only acceptable, but used often to de-noise high variance variables (Legendre and Legendre 1983). More generally, because the goal in statistical estimation is to estimate or confirm a given quantity (e.g., a parameter), tuning data or methods to produce a result would lead to bias. By contrast, the goal in cluster analysis is to create categories that reveal new information, therefore tuning data or methods until we learn something useful is perfectly reasonable (Grimmer and King 2011). 4 Citation data is obtained from the Social Science Citation Index (accessed in December 2009). 5 I thank George Borjas for providing these insights.

Figure 3. Heat Map of Migrant Attributes by Cluster Membership
Note: The heatmap color codes attributes (rows) of all migrants (columns). Gray indicates the presence of the attribute, and white indicates its absence. The vertical black lines separate the four clusters identified with cluster analysis.