Assessing the Validity of Comparative Interrupted Time Series Designs and Practice: Lessons from Two within Study Comparisons
*Names in bold indicate Presenter
CITS design usage may be on the rise, but little is known about the conditions under which the design supports causal inference in practice. A modest but growing body of work seeks to examine the validity of ITS analyses through WSCs comparing their estimates to those from experimental designs. Results from several recent with study comparisons (WSCs) provide some reason for optimism about the performance of CITS. WSCs both have shown that CITS can produce results that are very similar to those from an RCT (Schneeweiss, Maclure, Carleton, Glynn, and Avorn, 2004; Fretheim, Soumerai, Zhang, Oxman, and Ross-Degnan, 2013; Somers, Zhu, Jacob, and Bloom, 2013; St. Clair, Cook, & Hallberg, 2014; St. Clair, Hallberg, and Cook, under review). However, in some cases, this correspondence is dependent on modelling choices made by the researcher as well as the stability of the pretreatment trend (St. Clair, Cook, & Hallberg, 2014; St. Clair, Hallberg, and Cook, under review).
Applied researchers face two primary analytic decisions when implementing a CITS design: (1) how to model the pretreatment trend and (2) how to select a comparison group. This study draws on data from two empirical within study comparisons to examine the implications of these decisions. The first empirical WSC draws on data from an RCT studying the effects on an online mathematics program and the second an RCT examining the effect of a whole-school reform model. Using these datasets, we estimate the ability to reproduce RCT results and calculate the degree of bias remaining after implementing three modeling approaches: (1) baseline mean; (2) the baseline slopes; (3) year-fixed effects. In addition, we examine the performance of these three methods when supplemented with identifying a comparison group identified in four ways: (1) using all available non-treatment cases; (2) matching on pretreatment measures of the outcome; (3) identifying geographically local matches; and (4) implementing a hybrid matching approach which combines matching on pretreatment measures of the outcome and local matching.