Panel Paper: Now You See Me High: School Dropout and Machine Learning

Saturday, November 4, 2017
Picasso (Hyatt Regency Chicago)

*Names in bold indicate Presenter

Dario Sansone, Georgetown University

This paper provides an algorithm to predict which students are going to drop out of high schools relying only on information from 9th grade. It verifies that using a naïve model - as implemented in many schools - leads to poor results. It shows that schools can obtain more precise predictions by exploiting the available high-dimensional data jointly with machine learning tools such as Support Vector Machine, Boosted Regression and LASSO. Goodness-of-fit criteria are carefully selected based on the context and the underlying theoretical framework: model parameters are calibrated by taking into account policy goals and budget constraints. Finally, unsupervised machine learning is used to divide students at risk of dropping out into clusters.