Panel:
Using Within-Study Comparison Approaches to Examine Systematic Variation and Generalization of Treatment Effects
(Tools of Analysis: Methods, Data, Informatics and Research Design)
*Names in bold indicate Presenter
The earliest WSC designs used data from job training evaluations to compare results from a non-experimental study with those from an experimental benchmark which shared the same treatment group. Results from early WSCs had profound influence on research practice and priorities. The Office of Management and Budget cited results from early WSCs in their 2004 recommendation that federal agencies should use randomized experiments to evaluate program impacts, cautioning against the use of “comparison group studies” that “often lead to erroneous conclusions” (OMB, 2004). In recent years, WSCs have continued to examine contexts and conditions under which non-experimental methods perform well (or fail to perform) in field settings. Researchers have also applied the WSC design to learn about systematic variation and generalization of treatment effects. Here, the RCT is used as a benchmark for assessing the performance of a non-experimental method which intends to examine treatment effect heterogeneity or generalized treatment effects in out-of-sample groups.
This panel highlights recent methodological advances in WSCs to "learn more" about systematic variation and generalized treatment effects. In the first paper, Kaitlin Anderson and Patrick Wolfe present results from their WSC, which takes advantage of data from a large-scale private school voucher lottery to evaluate the performance of propensity score matching and observational approaches using control variables. In the second paper, Andrew Jaciw discusses how WSCs may be used to identify and assess the generalization of treatment effects in a multi-site RCT. In the third paper, Laura Peck and her colleagues (Stephen Bell, Eleanor Harvill, and Shawn Moulton) also explore using a WSC design and a multi-site RCT to examine how treatment effects differ based on different treatment enhancements and implementations. Finally, Vivian Wong and Peter Steiner extend the WSC approach to consider the theory and analysis of replication studies more generally. They examine the stringent assumptions required for assessing correspondence in two study designs, and demonstrate the statistical properties of common metrics used to determine correspondence in study results.
Combined, these papers provide researchers with the most up-to-date methodological advances in within-study comparisons used to evaluate the performance of non-experimental methods, and, importantly, to understand heterogeneity and the generalization of treatment effects.