Statistical Power for Short, Comparative Interrupted Time Series Designs with Aggregated Data

Swanlund, Andrew P.; Swanlund, Andrew P.

In designing studies, applied researchers must be able to assess whether their study will be sensitive enough to detect the effects they are purporting to study. A robust literature has developed to explore the design sensitivity of randomized controlled trials. Methodologists have detailed the factors that influence the power of a given study to detect effects (e.g. Raudenbush, 1997; Schochet, 2005; Bloom, Richburg-Hayes, & Black, 2007; Raudenbush, Martinez, & Spybrook, 2011), and have provided empirical parameters to guide power calculations (e.g. Hedges & Hedberg, 2007; Hill, Bloom, Black, & Lipsey, 2008).

Less work has been done on design sensitivity for CITS. Bloom (1999; 2003) and Dong and Maynard, (2012) have done the most extensive work on power in CITS to date. They derive power equations for CITS that incorporate student level data (using three-level hierarchical linear models) for a baseline linear trend model, with fixed individual follow-up deviations from the pre-interruption trend line. This paper will build on this work in several ways. We will focus on CITS using aggregate data rather than individual level data and extend power calculations to incorporate additional approaches to modeling. In addition, we will demonstrate the impact of autocorrelation on the precision of impact estimates. Finally, we will provide estimates of what power for aggregate CITS models will look like in practice using parameters gathered from real-world longitudinal educational achievement data.

In recent years, longitudinal, aggregate student achievement scores have become increasingly available on public education department websites. Jacob, Goddard, and Kim (2013) explored the usefulness of such aggregate data and found them to be valid for studying educational policy. However, using school level data rather than student level data has implications for the distribution of variance and thus statistical power. Our presentation will show statistical power formulae for aggregate CITS designs and extend to a discussion of power to additional CITS models. The current literature on power in CITS focuses on one CITS modelling approach (the baseline linear-trend model); however, applied researcher use a variety of approaches to model pre-intervention outcomes in CITS (including the baseline mean model, the nonlinear-trend model, and the year and school fixed effects model). The proposed study will extend power calculations to each of these approaches.

Autocorrelation (also called serial correlation) is known to influence both precision and internal validity for time series models. Yet the current work on power in CITS does not account for autocorrelation. To study the impact of autocorrelation on CITS power, we intend to derive equations which do and do not account for autocorrelation.

Finally, applied researchers often want to examine the statistical power of a given CITS design or assess the trade-offs between different approaches to implementing CITS before they have access to study data. To do this, researchers must make assumptions about a variety of study parameters. The proposed study will draw on existing longitudinal school-level data to provide empirically based estimates to inform these decisions.

Association for Public Policy Analysis & Management

Panel Paper: Statistical Power for Short, Comparative Interrupted Time Series Designs with Aggregated Data