Panel: Using Within-Study Comparison Approaches to Examine Systematic Variation and Generalization of Treatment Effects
(Tools of Analysis: Methods, Data, Informatics and Research Design)

Thursday, November 2, 2017: 3:30 PM-5:00 PM
McCormick (Hyatt Regency Chicago)

*Names in bold indicate Presenter

Panel Organizers:  Vivian C Wong, University of Virginia
Panel Chairs:  Coady Wing, Indiana University
Discussants:  Austin Nichols, Abt Associates, Inc.


Evaluating School Vouchers: Evidence from a within-Study Comparison
Kaitlin Anderson, Michigan State University and Patrick J. Wolf, University of Arkansas



A Three-Armed, Multi-Site Evaluation Design’s Potential for Within Study Comparison and Policy Learning
Laura Peck, Eleanor L. Harvill and Douglas Walton, Abt Associates, Inc.



Assessing Correspondence in (Design)-Replication Studies
Vivian C Wong, University of Virginia and Peter Steiner, University of Wisconsin - Madison


Over the last three decades, the emerging field of within-study comparisons (WSCs), also called design replication studies, has allowed researchers to evaluate the performance of non-experimental designs and design features in field settings. In the traditional WSC design, treatment effects from a randomized control trial (RCT) are compared to those produced by a non-experimental analysis of the same target population. The non-experiment may be a quasi-experimental design, such as a regression-discontinuity or an interrupted time series, or an observational study that includes matching methods, standard regression adjustments, or difference-in-differences methods. The goals of the WSC are to determine whether the non-experiment can replicate results from a randomized experiment (which provides the causal benchmark estimate), and the contexts and conditions under which these methods work in practice.

The earliest WSC designs used data from job training evaluations to compare results from a non-experimental study with those from an experimental benchmark which shared the same treatment group. Results from early WSCs had profound influence on research practice and priorities. The Office of Management and Budget cited results from early WSCs in their 2004 recommendation that federal agencies should use randomized experiments to evaluate program impacts, cautioning against the use of “comparison group studies” that “often lead to erroneous conclusions” (OMB, 2004). In recent years, WSCs have continued to examine contexts and conditions under which non-experimental methods perform well (or fail to perform) in field settings. Researchers have also applied the WSC design to learn about systematic variation and generalization of treatment effects. Here, the RCT is used as a benchmark for assessing the performance of a non-experimental method which intends to examine treatment effect heterogeneity or generalized treatment effects in out-of-sample groups.

This panel highlights recent methodological advances in WSCs to "learn more" about systematic variation and generalized treatment effects. In the first paper, Kaitlin Anderson and Patrick Wolfe present results from their WSC, which takes advantage of data from a large-scale private school voucher lottery to evaluate the performance of propensity score matching and observational approaches using control variables. In the second paper, Andrew Jaciw discusses how WSCs may be used to identify and assess the generalization of treatment effects in a multi-site RCT. In the third paper, Laura Peck and her colleagues (Stephen Bell, Eleanor Harvill, and Shawn Moulton) also explore using a WSC design and a multi-site RCT to examine how treatment effects differ based on different treatment enhancements and implementations. Finally, Vivian Wong and Peter Steiner extend the WSC approach to consider the theory and analysis of replication studies more generally. They examine the stringent assumptions required for assessing correspondence in two study designs, and demonstrate the statistical properties of common metrics used to determine correspondence in study results.   

Combined, these papers provide researchers with the most up-to-date methodological advances in within-study comparisons used to evaluate the performance of non-experimental methods, and, importantly, to understand heterogeneity and the generalization of treatment effects.