Panel Paper:
Addressing External Validity in within-Study Comparisons
*Names in bold indicate Presenter
This study proposes an expansion to WSC designs to incorporate concerns of external validity. Namely, we propose adding a second non-experimental analysis that studies the full inference population. For example, in an evaluation of a school improvement program, the second non-experimental study would analyze all schools that are using the program using a CITS design (defining these schools as the inference population). This results in three causal estimates, the experimental estimate, the non-experimental estimate on the experimental population, and the non-experimental estimate on the inference population. The first two estimates potentially have external validity bias while the last two potentially have internal validity bias. Under a set of assumptions, which we detail, we can use these three estimates to get both an estimate of the internal validity bias due to non-random assignment to treatment in the non-experimental approaches (using the difference between the non-experimental estimate on the experimental sample and the experimental estimate) and an estimate of the external validity bias due to non-randomly sampling the experimental sample from the inference population (using the difference between the two non-experimental estimates). Comparing the internal and external validity bias enables a valuation of whether experimental or non-experimental approaches provide more accurate estimate of program effectiveness in the inference population. I demonstrate this approach using data from a recent experimental evaluation of a beginning literacy program called Burst©:Reading, showing the external validity bias is greater than internal validity bias (at least in the point estimate), suggesting non-experimental approaches are more accurate.
This expansion of WSC designs allows for a more authentic comparison of experimental and non-experimental approaches because it pits the strengths of experimental approaches (i.e. high internal validity) against the strengths of many non-experimental approaches (i.e. high external validity). As such, it more comprehensively portrays the trade-offs in different approaches to establishing evidence of program effectiveness. I discuss ways of further expanding this approach to incorporate multiple inference populations.