Panel: Integrating Machine Learning and Policy Evaluation to Detect Heterogeneous Treatment Effects
(Tools of Analysis: Methods, Data, Informatics and Research Design)

Thursday, November 2, 2017: 1:45 PM-3:15 PM
McCormick (Hyatt Regency Chicago)

*Names in bold indicate Presenter

Panel Organizers:  Jason Fletcher, University of Wisconsin - Madison
Panel Chairs:  Sarah Tahamont, University of Maryland
Discussants:  Jens Ludwig, University of Chicago

This panel will introduce the use of machine learning algorithms to detect heterogeneity in policy effects using examples from a variety of policy areas and methods.

To date, examinations of social policies have understandably focused on the overall population average treatment effects. However, within this population average, there may be substantial differences in the effects of the policy, with the potential for a policy to increase or decrease outcomes and inequalities, depending on the population groups that do and do not benefit. Traditionally, examinations of the heterogeneity of treatment effects has proceeded by priors from the literature, and due to power issues generally has examined only a few potential factors leading to heterogeneous effects. At the same time, there have been considerable advances in machine learning algorithms that scan over a large number of covariates to establish models of covariates that best explain a specified outcome, penalizing for greater degrees of freedom that come from multiple comparisons. 

This panel includes several complementary approaches to this problem that will be of wide use.  Indeed, much of our analysis is nearly “off-the-shelf” so that a variety of policy analysts, researchers, and practitioners could use these methods to evaluate their own programs, policies, and interventions.  Our panel includes example analyses from a broad range of applications of interest to APPAM participants, including education policy, health policy, and social welfare policy.


Blume and Long develop a simple tool that (a) maintains the linear approach with interaction terms, and (b) draws on machine learning techniques to more efficiently explore the data.  They have developed such a tool (available in the Stata and R programming languages) that helps identify the characteristics of individuals who have the highest and lowest responses to a treatment in the context of randomized controlled trials.  They apply this method to evaluate three programs: the national Head Start Impact Study, the Oregon Medicaid experiment, and the Washington DC Opportunity Scholarship Program. 


Rehkopf uses a set of approaches to examine potential heterogeneity of treatment effects of the largest anti-poverty policy in the United States, the Earned Income Tax Credit. He examines the spatial and temporal changes in the generosity of the policy over time as an exogenous exposure with effects on child development outcomes using the NLSY. Rather than examining heterogeneity of treatment effects only by basic demographic factors, he uses an ensemble machine learning approach (using multiple machine learning algorithms including random forest, Elastic-Net, Least Angle Regression, Support Vector Machine, Bayesian GLM) to examine whether treatment effects differ by several dozen potential demographic, socioeconomic, environmental and behavioral factors.


Fletcher applies new methods that combine Classification and Regression Tree (CART) algorithms from machine learning with standard policy evaluation to the large scale class size reduction experiment in Tennessee from the mid 1980s, Project STAR.  The approach focuses on assuring causal inference is maintained when using machine learning techniques and scans a broad set of student, teacher, and school characteristics to explore complex heterogeneity in treatment effects.