Panel Paper: A Simple Machine Learning Approach for Program Evaluators Investigating Heterogeneity 

Thursday, November 2, 2017
McCormick (Hyatt Regency Chicago)

*Names in bold indicate Presenter

Mark Long and Grant Blume, University of Washington


Program evaluators often want to know who is most affected by a treatment, and by far the most common way that evaluators explore the prospects of heterogenity is with a simple linear regression that includes an interaction of the treatment indicator with some characteristic of the unit of observation. Often, this approach is exploratory (e.g., where theory is lacking regarding the types of persons who should be most responsive to the treatment). Such evaluators could benefit from having a simple tool that (a) maintains the linear approach with interaction terms, and (b) draws on machine learning techniques to more efficiently do such an exploration. We have developed such a tool (available as a command for the Stata and R programming languages) that helps identify the characteristics of individuals who have the highest and lowest responses to a treatment in the context of randomized controlled trials. Our method includes cross-validation and out-of-sample prediction. We apply this method to evaluate three programs: the national Head Start Impact Study, the Oregon Medicaid experiment, and the Washington DC Opportunity Scholarship Program. For each of these programs, we find significant response heterogeneity and are able to identify the characteristics of individuals who are most likely to be impacted. Our method is designed to be simple to use by researchers and practitioners and to yield predicted heterogeneity that can be used by program administrators to prioritize the future delivery of the treatment to certain subpopulations. Our method computes the extent to which efficiency could be increased by targeting scarce program resources and program participation towards particular groups.