Panel:
The Generalizability of Impact Evaluations Findings: New Empirical Evidence
(Tools of Analysis: Methods, Data, Informatics and Research Design)
*Names in bold indicate Presenter
This session will offer additional evidence on the extent to which the findings from impact evaluations—in particular, impact evaluations of educational interventions—generalize to other populations. The first paper, by Larry Orr and colleagues, presents empirical evidence on the generalizability of results from several multi-site impact evaluations in education to individual sites. The second paper, by Sean Tanner, presents empirical evidence on the generalizability of the samples selected for impact studies that have been reviewed by the What Works Clearinghouse; it also provides evidence on whether findings from early studies reviewed by the WWC were replicated by later studies of the same intervention. The third paper, by Stephen Bell and colleagues, provides evidences on the extent to which the external validity bias from purposive site selection can be reduced using publicly available data on schools and districts (e.g., from the Common Core of Data) and simple statistical methods (e.g., linear regression models, propensity score matching).
References:
Allcott, H. (2015). Site selection bias in program evaluation. The Quarterly Journal of Economics, 130(3), 1117-1165.
Cole, S.R. and Stuart, E.A. (2010). Generalizing evidence from randomized clinical trials to target populations: the ACTG-320 trial. American Journal of Epidemiology, 172: 107-115.
Kern, H.L., Stuart, E.A., Hill, J., and Green, D.P. (2016). Assessing methods for generalizing experimental impact estimates to target populations. Journal of Research on Educational Effectiveness, 9(1), 103-127.
Olsen, R.B. & Orr, L.L. (2016). On the “where” of social experiments: Selecting more representative samples to inform policy. Special issue on Social Experiments in Practice, New Directions for Evaluation.
Olsen, R. B., Orr, L. L., Bell, S. H., & Stuart, E. A. (2013). External validity in policy evaluations that choose sites purposively. Journal of Policy Analysis and Management, 32(1), 107-121.
Stuart, E. A., Bell, S. H., Ebnesajjad, C., Olsen, R. B., & Orr, L. L. (2017). Characteristics of school districts that participate in rigorous national educational evaluations. Journal of Research on Educational Effectiveness, 10(1), 168-206.
Tipton, E. (2013). Improving generalizations from experiments using propensity score subclassification: Assumptions, properties, and contexts. Journal of Educational and Behavioral Statistics, 38(3), 239-266.
Tipton, E. (2014) Stratified sampling using cluster analysis: A sample selection strategy for improved generalizations from experiments. Evaluation Review, 37(2): 109-139.