Panel Paper: Identifying US-Based Refugees By Proxy in Secondary Data: Conceptual and Analytical Implications

Tuesday, July 30, 2019
40.S14 - Level -1 (Universitat Pompeu Fabra)

*Names in bold indicate Presenter

Tim Carroll, New York University


This technical paper discusses the implications of using proxy variables in secondary data to identify US-based refugees (i.e. individuals admitted through the US Refugee Resettlement Program) when actual class of admission is absent from the data. Better data practices will enable rigorous research on the experiences and outcomes of refugee communities. This need is particularly urgent in a context of political hostility toward refugees and the lowest admissions ceiling since the modern resettlement program’s inception.

Quantitative research addressing US-based refugees has been hampered by a dearth of data sources that directly identify refugee status (Batalova et al., 2018; Chin & Cortes, 2015). One promising alternative is identification of refugees by proxy. Variables including country of origin and year of arrival have been used to identify individuals likely to be refugees based on historical admissions patterns (e.g. Capps & Newland, 2015; Cortes, 2004; Fix et al., 2017; Hooper et al., 2016). However, the selection of appropriate proxies remains underexplored, and to my knowledge, the accuracy of a proxy-driven refugee identification strategy has never been empirically tested.

Using the only nationally representative dataset to include respondents' actual class of admission, I identify refugee status by proxy and compare that identification to respondents' actual status, asking: How accurately can refugee status be imputed based on proxy variables? Who is included in and excluded from a proxy-driven refugee identification strategy? And what are the implications of such a strategy for statistical analysis of key post-resettlement outcomes such as educational attainment?

Full Paper: