Panel Paper:
Extensions of Within-Study Comparison Approaches to Investigate the Generalizability of Causal Inferences Across Study Sites
*Names in bold indicate Presenter
The current work extends the multisite variant of WSC to investigate bias from using a non-experimental counterfactual to cases randomly assigned to control. It asks: how accurately can we infer how the control group would have performed at a given site, had they been assigned to treatment, using performance outcomes from individuals randomly assigned to treatment at the other sites? The availability of an experimental estimate at the target site allows gauging the accuracy of the non-experimental approach. This application of WSC assesses how well outcomes from other sites allow an accurate generalization of impact for a given site.
The current work has three main foci. The first is the development of the WSC framework to address questions of external validity. We show that average absolute bias, from using a non-experimental comparison to infer counterfactual performance to controls at a site, can be decomposed into three terms due to: differences between sites being compared in average performance of the controls (Bias 1 reflects effects of confounders, and is the quantity of interest in traditional WSC studies), the difference between them in average program impact (Bias 2 reflects imbalance on moderators of impact), and the covariance between these two biases. Second, we develop an approach to summarizing bias, when each site of a multisite trial yields an estimate of bias. We propose the square root of average squared bias as an alternative to the often-used average of the absolute value of bias. It allows summarizing separately the three quantities described above. Third, we apply the methodology to results from two multisite trials in education: the Tennessee STAR Class Size Reduction Experiment and a randomized trial of the Alabama Math Science and Technology Initiative (AMSTI). We found Bias 1 to be larger than Bias 2. For example, in the AMSTI study, Bias 1 and Bias 2 were .42 and .10 standard deviation units, respectively, before covariate adjustment, and .07 and .08, after adjusting for main and moderating effects of covariates; also, a negative covariance between biases led to total average absolute bias of .04 standard deviations with covariate adjustments.
We conclude by discussing implications for assessing accuracy of different causal quantities (i.e., ITT versus TOT), and how lessons learned from decades of WSC research may be applied to the methods investigated in the current work.