Panel Paper: Predicting Family Homelessness Using Machine Learning

Saturday, November 5, 2016 : 4:10 PM
Embassy (Washington Hilton)

*Names in bold indicate Presenter

Robert Collinson1, Maryanne Schretzman2, Eileen Johns2, Jessica A Raithel2 and Davin Reed1,3, (1)New York University, (2)New York City Office of the Deputy Mayor for Health & Human Services, (3)Furman Center


Homelessness is among the most pressing public policy challenges in New York City, with almost 50,000 individuals staying in family shelters in 2015. While effective interventions are known to exist, the extent to which they can reduce family homelessness depends both on the resources available and the ability of to efficiently serve the families at greatest risk of homelessness (targeting). Machine learning can help solve two key challenges to effective targeting. First, families and providers may lack information about families' true risk of homelessness. Second, families that do seek services possess private information regarding their true housing options and needs, resulting in adverse selection of families to certain services or inefficient matching of families by providers to the most cost-effective services.

To address these challenges, we assemble a novel administrative data set of disadvantaged New Yorkers and predict family shelter stays using a variety of machine learning algorithms and predictors based on benefits history, demographics, shelter histories, housing court interactions such as evictions, and building and neighborhood characteristics. Our models demonstrate considerable predictive accuracy, identifying the riskiest ten percent of actual shelter applicants with 66 percent precision and the riskiest half with 20 percent precision. We find that individual, building, and neighborhood characteristics all help predict family shelter entry and use the best predictors to develop an easily implementable heuristic risk model.

To measure the potential gains from machine learning targeting over family self-assessment, we link our sample to data from Homebase, New York City's primary homelessness prevention program. We compare the shelter risk of individuals seeking services through Homebase to those we can identify in our models. Combining treatment effect parameters estimated from previous work with our prediction results, we simulate the number of people prevented from shelter under the current program design and under an algorithm-directed design. At the same level of outreach, algorithmic models could increase the precision with which Homebase services are provided to at-risk families from 19 to 35 percentage points, averting almost 1,000 shelter entries over a two-year period.