The Generalizability of Impact Evaluations Findings:  New Empirical Evidence

Deke, John; Deke, John

Panel: The Generalizability of Impact Evaluations Findings: New Empirical Evidence
(Tools of Analysis: Methods, Data, Informatics and Research Design)

Thursday, November 2, 2017: 1:45 PM-3:15 PM

Field (Hyatt Regency Chicago)

*Names in bold indicate Presenter

Panel Organizers: Robert Olsen, Rob Olsen LLC

Panel Chairs: Kelly Hallberg, University of Chicago

Discussants: John Deke, Mathematica Policy Research

How Much Can External Validity Bias Be Reduced by Aligning Sample and Population on School District Characteristics?

Stephen Bell¹, Robert Olsen², Larry Orr³, Elizabeth Stuart³ and Michelle Wood¹, (1)Abt Associates, Inc., (2)Rob Olsen LLC, (3)Johns Hopkins University

Using Rigorous Evaluation Results to Improve Local Policy Decisions

Larry Orr¹, Stephen Bell², Robert Olsen³, Elizabeth Stuart¹, Azim Shivji² and Ian Schmidt¹, (1)Johns Hopkins University, (2)Abt Associates, Inc., (3)Rob Olsen LLC

External Validity in U.S. Education Research: Evidence from the What Works Clearinghouse

Patrick Sean Tanner, Learning Policy Institute

While impact evaluations provide evidence on causal effects for the study sample, a key question is whether the evidence generalizes beyond the study sample to allow policymakers to predict the consequences of policy decisions like whether to adopt or scale-up an intervention or whether to cancel a program. To date, there has been some research on the theory of the problem (e.g., Olsen et al., 2013, Tipton et al., 2013) and improved designs and analysis methods to address the problem (e.g., Cole and Stuart, 2010; Kern et al., 2016, Olsen and Orr, 2016; Tipton 2014). This panel will add to a small but growing body of evidence (e.g., Allcott, 2015; Bell et al., 2015; Stuart et al., 2017) on the generalizability of impact evaluation findings beyond their samples to the populations of interest for policy.

This session will offer additional evidence on the extent to which the findings from impact evaluations—in particular, impact evaluations of educational interventions—generalize to other populations. The first paper, by Larry Orr and colleagues, presents empirical evidence on the generalizability of results from several multi-site impact evaluations in education to individual sites. The second paper, by Sean Tanner, presents empirical evidence on the generalizability of the samples selected for impact studies that have been reviewed by the What Works Clearinghouse; it also provides evidence on whether findings from early studies reviewed by the WWC were replicated by later studies of the same intervention. The third paper, by Stephen Bell and colleagues, provides evidences on the extent to which the external validity bias from purposive site selection can be reduced using publicly available data on schools and districts (e.g., from the Common Core of Data) and simple statistical methods (e.g., linear regression models, propensity score matching).

References:

Allcott, H. (2015). Site selection bias in program evaluation. The Quarterly Journal of Economics, 130(3), 1117-1165.

Cole, S.R. and Stuart, E.A. (2010). Generalizing evidence from randomized clinical trials to target populations: the ACTG-320 trial. American Journal of Epidemiology, 172: 107-115.

Kern, H.L., Stuart, E.A., Hill, J., and Green, D.P. (2016). Assessing methods for generalizing experimental impact estimates to target populations. Journal of Research on Educational Effectiveness, 9(1), 103-127.

Olsen, R.B. & Orr, L.L. (2016). On the “where” of social experiments: Selecting more representative samples to inform policy. Special issue on Social Experiments in Practice, New Directions for Evaluation.

Olsen, R. B., Orr, L. L., Bell, S. H., & Stuart, E. A. (2013). External validity in policy evaluations that choose sites purposively. Journal of Policy Analysis and Management, 32(1), 107-121.

Stuart, E. A., Bell, S. H., Ebnesajjad, C., Olsen, R. B., & Orr, L. L. (2017). Characteristics of school districts that participate in rigorous national educational evaluations. Journal of Research on Educational Effectiveness, 10(1), 168-206.

Tipton, E. (2013). Improving generalizations from experiments using propensity score subclassification: Assumptions, properties, and contexts. Journal of Educational and Behavioral Statistics, 38(3), 239-266.

Tipton, E. (2014) Stratified sampling using cluster analysis: A sample selection strategy for improved generalizations from experiments. Evaluation Review, 37(2): 109-139.

See more of: Tools of Analysis: Methods, Data, Informatics and Research Design
See more of: Panel

Panel: The Generalizability of Impact Evaluations Findings: New Empirical Evidence (Tools of Analysis: Methods, Data, Informatics and Research Design)

Panel: The Generalizability of Impact Evaluations Findings: New Empirical Evidence
(Tools of Analysis: Methods, Data, Informatics and Research Design)