Building Evaluation Capacity through Design: An Investigation of Statistical Power for Planning Impact Evaluations with Student and Teacher Outcomes

Zhang, Qi; Zhang, Qi

Impact evaluations that utilize a cluster randomized trial (CRT) are commonly conducted to assess the efficacy of educational interventions, which are often implemented at the school level. Studies with schools as the natural unit of random assignment often have teachers nested within schools and students nested within teachers. Further, it is often the case that the outcomes of interest are at both the student- and the teacher-level. For instance, CRTs designed to evaluate the efficacy of teacher professional development (PD) programs are interested in determining the effect of PD programs on teacher content knowledge and teacher practice, as well as their impact on student achievement. Since design parameters used to calculate the statistical power, such as the intra-class correlation coefficients (ICCs) and the outcome-covariate correlations (R²coefficients) tend to be different for the teacher and the student outcomes, a single a priori power analysis is generally not sufficient to assess whether the study is adequately powered to detect effects for teachers and students. Therefore, researchers need to conduct two power analyses—one for teacher outcomes and one for student outcomes—and determine sample size requirements according to both analyses, as a design that is adequately powered to detect a meaningful impact at the student level is not necessarily powered to detect a meaningful impact at the teacher level and vice versa.

This proposal examines design considerations for studies that seek to evaluate the effectiveness of educational interventions for both teachers and students within one study. Specifically, this proposal incorporates new empirical work on design parameters for planning CRTs with student and teacher outcomes to provide insights for CRT designs of K-12 science educational interventions. The goal of this proposal is to estimate the statistical power of these studies, examine the alignment of the power analyses when a study seeks to examine both teacher and student effects, and suggest considerations that can be incorporated into future planning to maximize the efficiency of the study design.

Our initial result suggested studies that include at least 40 schools, 5 teachers per school, and 25 students per teacher may be able to detect a meaningful effect of the intervention for both students and teachers. This was possible because the larger effect size for teachers observed compared to the effect size for students in a meta-analysis study of the effect of educational interventions for science teachers. Studies with less than 40 schools may be able to detect meaningful effects for students, but they may have some difficulties to achieve the same for teachers. This presentation will further extrapolate the results of power calculations for other teacher outcomes with varying numbers of teacher per school, as well as the implication of these results in planning CRTs that examine intervention effects both student and teacher. This presentation will also provide additional analyses covering subjects like math and reading.

Association for Public Policy Analysis & Management

Poster Paper: Building Evaluation Capacity through Design: An Investigation of Statistical Power for Planning Impact Evaluations with Student and Teacher Outcomes