Indiana University SPEA Edward J. Bloustein School of Planning and Public Policy University of Pennsylvania AIR American University

Panel: Improving the Performance of Quasi-Experimental Designs: Bias and Precision in Evidence-Based Policy Studies
(Tools of Analysis: Methods, Data, Informatics and Research Design)

Thursday, November 12, 2015: 8:30 AM-10:00 AM
Orchid B (Hyatt Regency Miami)

*Names in bold indicate Presenter

Panel Organizers:  Kelly Hallberg, University of Chicago
Panel Chairs:  Rachel Garrett, American Institutes for Research
Discussants:  Elizabeth Stuart, Johns Hopkins University


The Use of Propensity Score Methods for Addressing Attrition in Longitudinal Studies: Practical Guidance and Applications for Evaluating Early Childhood Interventions
Irma Arteaga, University of Missouri, Judy Temple, University of Minnesota Humphrey School of Public Affairs and Arthur J. Reynolds, Institute of Child Development, University of Minnesota



Assessing the Validity of Comparative Interrupted Time Series Designs and Practice: Lessons from Two within Study Comparisons
Kelly Hallberg1, Ryan T. Williams2 and Andrew P. Swanlund2, (1)University of Chicago, (2)American Institutes for Research



Methods for Assessing Correspondence in Non-Experimental and Benchmark Results in within-Study Comparison Designs: Results from an Evaluation of Repeated Measures Approaches
Vivian C. Wong1, Peter Steiner2 and Kate Miller-Bains1, (1)University of Virginia, (2)University of Wisconsin - Madison


Evidence based policy is contingent on knowing what programs are successful. RCTs are the “gold standard” for answering these questions. By randomly assigning units to treatment conditions, RCTs ensure that the treatment and control groups are equivalent in expectation (Rubin, 1974). However, RCTs are not always politically or ethically feasible to implement, and they may entail trade-offs in generalizability because participants in an RCT are rarely randomly sampled from a broader population of interest (Shadish, Cook, & Campbell, 2002). When RCTs are not feasible, applied researchers must rely on quasi-experimental research designs, but they can only do so as long as these designs produce trustworthy estimates of causal effects that can be measured with sufficient precision. Recent empirical work on the performance of quasi-experimental designs focuses on providing guidance to applied researchers on the conditions under which quasi-experimental approaches are likely to provide unbiased estimates of causal effects that are measured with sufficient precision. Within study comparisons (WSCs) have been conducted using data from randomized controlled trials (RCTs) to examine the conditions under which quasi-experiments reproduces experimental results. WSC studies empirically estimate the extent to which a given observational study reproduces the result of an RCT when both share the same treatment group (for example, see Cook, Shadish, and Wong, 2008; Glazerman, Levy, & Myers, 2002; Lalonde, 1986). Advancements in design science help researchers assess the implications that design decisions, such as modelling approaches, method for identifying comparison cases, clustering, have on statistical power to find effects (Raudenbush, 1997; Schochet, 2005; Bloom, Richburg-Hayes, & Black, 2007; Raudenbush, Martinez, & Spybrook, 2011). The three papers included in this session contribute to this growing literature. The first paper assesses the performance of different modeling and matching approaches used in comparative interrupted time series designs, drawing on evidence from two WSCs. The second paper assesses the implications of these design decisions on statistical power. The final paper wrestles with one of the most persistent issues in this literature: how to assess the correspondence of quasi-experimental and experimental estimates in WSCs.