Panel Paper: On Overfitting In Experimental Analysis of Endogenous Subgroups

Saturday, November 10, 2012 : 2:05 PM
Hanover B (Radisson Plaza Lord Baltimore Hotel)

*Names in bold indicate Presenter

Laura Peck and Eleanor L. Harvill, Abt Associates, Inc.

As the art and science of social policy experimentation evolve, government and foundation funders increasingly demand more for their evaluation dollars.  Not only are they interested in learning about the coarse causal effects of policy interventions overall, but they want to know what specifically it is about the intervention that is responsible for any observed effects.  To date, evaluators would couple rich implementation and process research with a rigorous impact evaluation to help answer this question.  Dissatisfied with this strategy, we urge innovation in impact analysis methods to better isolate the effects of components of multi-faceted treatments on various target subgroups.

          This paper reintroduces an approach that several scholars seem to have almost independently come up with – the idea to use regression-based subgroups to explore experimental impacts on something that would otherwise be tossed in the non-experimental bin.  Using baseline (exogenous) characteristics to predict post-random assignment (endogenous) subgroups, the approach creates symmetric subgroups between treatment and control groups, retaining the experimental design’s internal validity.  In order to maintain treatment-control symmetry, however, the entire treatment (or control) group cannot be used for both subgroup identification and impact estimation; if it is, then overfitting bias is introduced.  This is an problem only increases as sample sizes decrease, just when one might think the cost of drawing an external subsample is too great.  This paper delves into this topic to consider various approaches to preventing overfitting bias from entering the analysis. 

          Our analyses consider:  the magnitude of the problem, under various sample size assumptions; the optimal size of a subsample, given variation in overall sample size; the possibility that bootstrapping, jackknifing and Monte Carlo simulations can eliminate overfitting bias.  The analysis uses existing data sets from multiple experiments and simulated data.  Each of the simulated data sets is generated to match characteristics of one of the experimental data sets.  Because the true data generating process is known for the simulated data sets, we can directly measure the overfitting bias.  By working with data simulated to match a real-world experiment, we can understand how bias varies across the contexts represented by the different experiments; and we can detect how to overcome it.  

          Although this topic may seem narrow, it is clear that the approach that it informs – that of analyzing “endogenous” subgroups (e.g., Orr, 1999) within an experimental context – is becoming more and more widely used, demanding that the nuances of the approach be more carefully considered.

[Paper proposal sidebar:  I have discussed (in theory) this paper with Hans Bos, who would make a really spectacular discussant.  Other papers that would be well-matched with this one would focus on subgroup analysis, innovation in experimental design and analysis, using experimental data to answer policy questions and to better design and/or target interventions.]