Panel Paper: Testing Whether Nonexperimental Comparison Group Methods Can Replicate Experimental Impact Estimates for Charter Schools

Saturday, November 10, 2012 : 11:15 AM
Washington (Sheraton Baltimore City Center Hotel)

*Names in bold indicate Presenter

Kenneth Fortson, Natalya Verbitsky-Savitz, Emma Kopa and Phil Gleason, Mathematica Policy Research


Randomized controlled trials (RCTs) are widely considered to be the gold standard in evaluating the impacts of a social program. When an RCT is infeasible, researchers often estimate program impacts by comparing outcomes of program participants with those of a nonexperimental comparison group, adjusting for observable differences between the two groups. Nonexperimental comparison group methods could produce unbiased estimates if the underlying assumptions hold, but those assumptions are usually not testable in practice. Prior studies generally find that nonexperimental designs fail to produce unbiased estimates. However, these studies have been criticized for using only limited pre-intervention data, measuring outcomes and covariates inconsistently for different research groups, or drawing comparison groups from dissimilar populations. The present study was designed to address these challenges. We test the validity of two classes of comparison group approaches--regression modeling and matching--comparing nonexperimental impact estimates from these methods with an experimental benchmark. The analysis uses data from an experimental evaluation of charter schools and comparison data for other students in the same school districts in the baseline period. We find that the use of pre-intervention baseline data that are strongly predictive of the key outcome measures considerably reduces but might not completely eliminate bias. Regression-based nonexperimental impact estimates are significantly different from experimental impact estimates, though the magnitude of the difference is modest. In this study, matching estimators perform slightly better than do estimators that rely on parametric assumptions and generate impact estimates that are not significantly different from the experimental estimates. However, the matching and regression-based estimates are not greatly different from one another. These findings are robust to restrictions on the comparison group used, the modeling specifications employed, and the data assumed to be available.