Panel Paper: Assessing Statistical Methods for Estimating Population Average Treatment Effects from Purposive Samples in Education

Thursday, November 3, 2016 : 2:15 PM
Columbia 11 (Washington Hilton)

*Names in bold indicate Presenter

Elizabeth Stuart1, Robert Olsen2, Stephen Bell3 and Larry Orr1, (1)Johns Hopkins University, (2)Rob Olsen LLC, (3)Abt Associates, Inc.


The ultimate goal of many educational evaluations is to inform policy; e.g., to help policy makers understand the effects of interventions or programs in target populations of policy interest.  While randomized trials provide internal validity - they yield unbiased effect estimates for the subjects in the study sample - there is growing awareness that they may not provide external validity - the ability to estimate what effects would be in other, target populations or in some other context.  Recently developed statistical methods that use trial data and data on a population of interest have the potential to utilize the strong internal validity of trials while also enhancing the external validity.  These methods fall into two broad classes: (1) flexible regression models of the outcome as a function of treatment status and covariates, and (2) reweighting methods that weight the RCT sample to reflect the covariate distribution in the population. However, there has been little formal investigation of the methods and how well (or when) they might work. This paper presents results from simulation studies examining the performance of methods in each of these two broad classes of approaches.  The simulations are designed to be as realistic as possible, based on data on a representative sample of public school students nationwide, empirical evidence on impact variation in two large-scale RCTs in education, and evidence on the types of schools that were selected for several RCTs in education. We find that when the assumptions underlying each approach are satisfied each approach works well.  However, when key assumptions are violated – for example, if we do not observe all of the factors that moderate treatment effects and that differ between the RCT sample and the target population – none of the methods consistently estimates the population effects. We conclude with recommendations for practice, including the need for thorough and consistent covariate measurement and a better understanding of treatment effect heterogeneity. This work helps to identify the conditions under which different statistical methods can reduce external validity bias in educational evaluations.