Panel Paper: Manual Categories Versus Machine Learning in U.S. Social Policy: The Case of Housing Vouchers and School Districts

Friday, November 8, 2019
Plaza Building: Concourse Level, Plaza Court 6 (Sheraton Denver Downtown)

*Names in bold indicate Presenter

Rebecca Johnson and Simone Zhang, Princeton University


Researchers have documented the rise of public organizations using large-scale data and

machine learning to make decisions like whom to police (Brayne, 2017) or whom to offer

housing assistance (Eubanks, 2018). These accounts argue that algorithmic prioritization

exacerbates inequality. The present paper asks: what is the counterfactual? Put differently,

how do social policies currently decide who should receive limited resources? By evaluating

the system of prioritization that algorithmic prioritization may replace, we can reach more

balanced conclusions about the rise of algorithm-guided policy (e.g., Cowgill and Tucker,

2017).

Social policies currently use categorical prioritization. Many federal social policies first

take many attributes about individuals and manually choose a few deemed relevant for eligibility.

They then manually set thresholds to convert continuous attributes into categorical

ones and choose how different attributes combine to define eligibility. We draw upon our

quantitative research in two policy settings–which attributes local housing authorities prioritize

on Section 8 waitlists; which attributes state legislatures prioritize in school funding

formulas–to highlight two ways that categorical prioritization can amplify inequality.

First, manual selection of which categories to prioritize leads to a transparency paradox:

it is transparent which categories a policy prioritizes but opaque why. For instance, housing

authorities tend to prioritize veterans above households that pay over 30% of their income in

rent, despite the fact that aiding rent-burdened households is arguably more directly congruent

with the aims of housing assistance (Zhang and Johnson, 2018). More broadly, policies

may prioritize groups with more power to navigate the politics of policy inclusion (Skocpol,

1992; Starr, 1992; Mohr, 1994; Steensland, 2006). In contrast, algorithmic prioritization

requires defining an outcome to prioritize on the basis of–for instance, risk of homelessness–

and then prioritizes those with the highest estimated risk. This may decrease inequality by

prioritizing high-risk groups for whom social stigma has led to de-prioritization–for instance,

those with substance dependence issues at high risk of homelessness.

Second, the high stakes of only prioritizing a few categories means policymakers put extensive procedures in place to deter

false positives: those who should not qualify for a category like disability or low-income but

who manage to qualify. Yet these procedures increase false negatives–those who should qualify

but who lack the time, social capital, or know-how to navigate the complex procedures

required to certify eligibility (Moynihan et al., 2014). We show this unequal burden of false

negatives using data on inequality in parents’ ability to navigate the extensive procedures

that school districts use to certify that a child has a disability (Johnson, 2018). Algorithmic

prioritization, by modeling eligibility on the basis of many attributes that may be drawn

passively from administrative records, may decrease the false negatives induced by administrative

burden.

In sum, we contrast two methods that social policies can use to define eligibility: manual versus machine-learning guided eligibility. By drawing upon research in two policy settings, we aim to provide a fuller

account of the forms of inequality that algorithms may exacerbate or mitigate.