Probing Impact Heterogeneity Using Machine Learning Methods in the Evaluation of Early College High Schools in North Carolina

Unlu, Fatih; Unlu, Fatih

The proposed paper explores the use of emerging machine learning (ML) methods to probe the heterogeneity of impacts in a large-scale longitudinal experimental study on early college high schools in North Carolina. Early colleges are small schools that blur the high school and college experiences. The schools start in ninth grade, and students are expected to graduate in four or five years with a high school diploma and two years of college credit or an associate degree. From a pool of eligible 8^th grade students who applied to enroll in an early college in a given year, early colleges enrolled students based on lotteries conducted in each year. The eligible applicants who were randomly chosen to receive an invitation to enroll made up the treatment group while the rest of the eligible applicants (who generally attended the traditional high school in the district or “business as usual”) made up the control group. The study sample includes over 4000 students who participated in 44 lotteries conducted for 19 early colleges.

Results from existing analyses show that early colleges included in the study sample have had positive and statistically significant impacts on key predictors of success in college and receipt of a postsecondary credential. Through eight years after 9^th grade, treatment students had significantly higher rates of ever enrolling in college (90% treatment vs. 74% control), attainment of any postsecondary credential (37% treatment vs. 22% control), and attainment of a four-year degree (18% treatment vs. 13% control).

There is also some evidence of larger impacts on students who are disadvantaged and underprepared for high school but heterogeneity patterns are not consistent across outcome measures. This proposed paper will apply the machine learning methods suggested for exploring impact heterogeneity such as regression trees and random forest techniques (Wager & Athey, 2015; Athey & Imbens, 2016; Davis & Heller, 2017) to conduct a more systematic examination of variation in impacts in the context of early colleges. The ML methods provide a more flexible framework that searches for heterogeneity over data-driven and high-dimensional functions of baseline covariates that could reveal evidence for impact heterogeneity which may be missed by the conventional method of exploring impact heterogeneity via analyses of subgroups based on observed baseline characteristics. The early college high school impact study is a good candidate for the application of these methods given its fairly large sample and availability of a rich set of baseline covariates including student-level demographic and socioeconomic characteristics, engagement with schooling, and academic achievement.

Association for Public Policy Analysis & Management

Panel Paper: Probing Impact Heterogeneity Using Machine Learning Methods in the Evaluation of Early College High Schools in North Carolina