Improving the Performance of Quasi-Experimental Designs: Bias and Precision in Evidence-Based Policy Studies
(Tools of Analysis: Methods, Data, Informatics and Research Design)
Thursday, November 12, 2015: 8:30 AM-10:00 AM
Orchid B (Hyatt Regency Miami)
*Names in bold indicate Presenter
Panel Organizers: Kelly Hallberg, University of Chicago
Panel Chairs: Rachel Garrett, American Institutes for Research
Discussants: Elizabeth Stuart, Johns Hopkins University
Evidence based policy is contingent on knowing what programs are successful. RCTs are the “gold standard” for answering these questions. By randomly assigning units to treatment conditions, RCTs ensure that the treatment and control groups are equivalent in expectation (Rubin, 1974). However, RCTs are not always politically or ethically feasible to implement, and they may entail trade-offs in generalizability because participants in an RCT are rarely randomly sampled from a broader population of interest (Shadish, Cook, & Campbell, 2002). When RCTs are not feasible, applied researchers must rely on quasi-experimental research designs, but they can only do so as long as these designs produce trustworthy estimates of causal effects that can be measured with sufficient precision.
Recent empirical work on the performance of quasi-experimental designs focuses on providing guidance to applied researchers on the conditions under which quasi-experimental approaches are likely to provide unbiased estimates of causal effects that are measured with sufficient precision. Within study comparisons (WSCs) have been conducted using data from randomized controlled trials (RCTs) to examine the conditions under which quasi-experiments reproduces experimental results. WSC studies empirically estimate the extent to which a given observational study reproduces the result of an RCT when both share the same treatment group (for example, see Cook, Shadish, and Wong, 2008; Glazerman, Levy, & Myers, 2002; Lalonde, 1986). Advancements in design science help researchers assess the implications that design decisions, such as modelling approaches, method for identifying comparison cases, clustering, have on statistical power to find effects (Raudenbush, 1997; Schochet, 2005; Bloom, Richburg-Hayes, & Black, 2007; Raudenbush, Martinez, & Spybrook, 2011).
The three papers included in this session contribute to this growing literature. The first paper assesses the performance of different modeling and matching approaches used in comparative interrupted time series designs, drawing on evidence from two WSCs. The second paper assesses the implications of these design decisions on statistical power. The final paper wrestles with one of the most persistent issues in this literature: how to assess the correspondence of quasi-experimental and experimental estimates in WSCs.