Evidence shows that treatment effects significantly vary across study sites (e.g., Weiss, Bloom, & Brock, 2014; Weiss, et al, 2017). It is important to explore the sources to interpret the treatment effect variation, and multisite moderation analysis can be used for this purpose (Bloom & Spybrook, 2017; Weiss, Bloom, & Brock, 2014). In planning such a study, a key design consideration is the sample size necessary to achieve adequate statistical power (probability of detecting the main and moderator effects). There exist the tools for power analyses of the main effects of multisite randomized trials (MRTs) (e.g., Borenstein & Hedges, 2012; Dong & Maynard, 2013; Konstantopoulos, 2008; Raudenbush, Spybrook, Congdon, Liu, & Martinez, 2011) and for power analyses of moderator effects in two- and three-level cluster randomized trails (CRTs) (e.g., Dong, Kelcey, & Spybrook, 2017; Spybrook, Kelcey, & Dong, 2016). Although Raudenbush and Liu (2000) provided a simple framework for power analysis of moderator effect in two-level MRTs, and Bloom and Spybrook (2017) derived the formulas to detect the minimum detectable effect size difference (MDESD) for a binary site-level moderator in two- and three-level MRTs, there is no comprehensive statistical tool for power analyses of moderator effects in three-level MRTs. It is still not clear how the intraclass correlations at Levels 2 and 3, the covariates, the sample size allocations and the moderators at different levels, and the treatment effect variation/heterogeneity coefficients affect the statistical power of the moderator effects in three-level MRTs. Given the increasing uses of three-level multisite studies in program evaluation, the statistical tools and software for power analyses of the effects of moderators at different levels would enhance the capacity of researchers for designing rigorous studies to answer research questions related to the treatment effect variation.
To addresses this gap, in this paper we propose to develop a statistical and empirical framework for designing three-level multisite moderation studies. We derive formulas for the statistical power and the minimum detectable effect size difference (MDESD) with confidence intervals for moderator effects in three-level MRTs for both binary and continuous moderators at Levels 1, 2, and 3, and executes the formulas in the existing free-available software (PowerUp!-Moderator, Dong, Kelcey, Spybrook, & Maynard, 2017). We demonstrate how to use PowerUp!-Moderator using examples.
In summary, the results of this paper will substantially expand the scope and enhance the quality of evidence generated through multisite moderation studies in program evaluation.