Panel Paper: Within-Population Occupational Segregation By Ethnicity during the Age of Mass Migration: A Machine Learning Approach

Friday, November 8, 2019
Plaza Building: Concourse Level, Governor's Square 15 (Sheraton Denver Downtown)

*Names in bold indicate Presenter

Yuxin Zhang, University of Texas, Austin and Dafeng Xu, University of Washington


Many studies consider an immigrant population (defined based on birthplace) as a whole, and analyze occupational segregation across immigrant populations. This neglects possible ethnic heterogeneity within an immigrant population, which might lead to within-population occupational segregation. In this paper, we study Russian-born immigrants—including Russian, Jewish, German, and Polish ethnics—in the early twentieth century U.S., and examine occupational segregation and job networks by ethnicity among Russian-born immigrants using 1930 full-count U.S. census data.

This paper is structured into two parts. The first part focuses on methodology: the fundamental question of studying occupational segregation by ethnicity is to have a clear definition of ethnicity. In the first part of this paper, we design a machine learning approach of ethnicity classification based on mother tongue and name information surveyed in the U.S. census. The language and linguistic origin of names are two common ways to measure ethnicity in the classical literature of anthropology and other disciplines. We combine several cutting-edge machine learning algorithms (SVM, probit, naive Bayes, etc.) for ethnicity classification, and find our algorithm performs well in different test data.

In the second part of this paper, we study the substantive research questions—immigrants' occupational segregation and job networks—in 1930 full-count census data. We first conduct ethnicity classification, and indeed find high ethnic diversity among Russian-born immigrants, consistent with historical findings. Based on this ethnicity variable, we find high degrees of occupational segregation by ethnicity within the Russian population. For example, compared with other groups, Jews concentrated in wholesale and retail trade, and German ethnics were disproportionately more likely to work in agriculture.

Finally, we link the pattern of occupational segregation to network economics, and study the spatial dimension of occupational segregation by examining effects of co-ethnic residence on labor market outcomes: using OLS and instrumental variable models (where historical settlements of different ethnic groups serve as instruments), we find the concentration of co-ethnics—more established immigrants in particular—was positively related to employment status, occupational wage, and occupational standings. On the other hand, the spatial concentration of other ethnic groups—even if they were also originally from Russia—had weaker or no effects on these labor market outcomes. This is consistent with classical labor economic findings that immigrants’ social networks—measured by ethnic enclave residence—have positive impacts on immigrants’ labor market outcomes in the host country.

This paper make contributions to labor and immigration policies from both methodological and substantive perspectives. First, in this paper, we propose a novel measure of immigrant origin—ethnicity—using machine learning tools, which can be used to study a variety of research questions concerning labor markets and immigration. Second, this paper highlights the high degree of within-population occupational segregation, which suggests that occupational segregation might be underestimated based on the traditional measure of immigrant origin (e.g., by country of birth). Related to this point, we also show that immigrants' social networks can be better measured at a finer—ethnicity—level. While this paper focuses on historical contexts, we also discuss potential policy implications for economic issues in the contemporary U.S. regarding immigrants' labor market patterns.