Paper: Evaluating the Performance of Interrupted Time Series and Difference in Differences Approaches in Replicating Experimental Benchmark Results (2013 APPAM Fall Research Conference (November 7

Thursday, November 7, 2013 : 9:45 AM

DuPont Ballroom H (Washington Marriott)

*Names in bold indicate Presenter

Kevin McConeghy¹, Peter Steiner², Coady Wing¹ and Vivian C. Wong³, (1)University of Illinois, Chicago, (2)University of Wisconsin, Madison, (3)University of Virginia

Quasi-experimental research designs that exploit temporal variation in a treatment condition of interest are very common in the empirical social sciences. In particular, research designs based on an interrupted time series (ITS), a comparison of differences in difference (DID), and various generalizations of such methods are often employed to answer questions about the effects of changes in the parameters of social programs and regulations. Researchers often justify these methods by arguing that the analysis avoids or accounts for various threats to internal validity that arise in non-experimental settings, and a variety of techniques and tests have been developed to address common concerns. Of course, most threats can never completely ruled out and the performance of many techniques and tests in applied settings is not well understood.

A growing literature uses the method of within study comparison to evaluate the performance of quasi-experimental methods relative to benchmarks provided by a randomized experiment. There have been several within study comparisons of matching methods and regression discontinuity designs. In contrast, there are very few within study comparisons of the kind of ITS and DID approaches that are so common in the applied literature. We help address this deficiency in this paper by conducting a within study comparison of ITS and DID methods using experimental data from the Cash and Counseling Demonstration Project, which studied the effects of a “consumer-directed” care program on the health and expenditure patterns of a disabled Medicaid enrollees. The original study was conducted in three different states (Arkansas, New Jersey, and Florida), randomly assigned people to treatment and control groups, and followed each participant for 12 months before the intervention and 12 months after the intervention.

We used experimental data to conduct several within study comparisons. We created a simple ITS design within each of the three states by deleting control group information and estimating treatment effects in a regression framework that allowed for intercept and slope changes after the introduction of the treatment. Next, we studied standard and flexible versions of the DID approach by combining the treatment group data from one state with the control group data from the other states to simulate a setting in which a policy change occurred in one state and not in the others. Finally, we studied a more elaborate approach in which matching methods were used to construct an out of state control group and then DID methods were applied. In each case, we evaluated the performance of each method in terms of reproducing benchmark estimates from the randomized experiment. We also studied the effectiveness of several tests and strategies for ruling out known threats to validity.

Panel Paper: Evaluating the Performance of Interrupted Time Series and Difference in Differences Approaches in Replicating Experimental Benchmark Results