Panel Paper: Measuring Classroom Quality for Accountability Purposes: How Dynamic and Variable Are Assessments of Preschool Teachers over the Course of the Year?

Friday, November 8, 2019
Plaza Building: Concourse Level, Plaza Court 7 (Sheraton Denver Downtown)

*Names in bold indicate Presenter

Olivia Healy1, Kathryn E. Gonzalez2, Luke Miratrix2 and Terri Sabol1, (1)Northwestern University, (2)Harvard University


Early childhood education accountability policies, like state-level Quality Rating and Improvement Systems (QRIS) and the Head Start Designation Renewal System (DRS), increasingly rely on observational measures of classroom quality to monitor and evaluate programs. These systems use observations of classrooms, conducted for different programs at different times during the school year, to measure average classroom quality. The assumption underlying this approach is that assessments in a single week (or over a few days) reflect the quality of instruction throughout the year in that classroom. The present study tests this assumption by exploring (1) trends in observed quality, as measured by the Classroom Assessment Scoring System (CLASSTM; Pianta, LaParo & Hamre, 2008), over the course of the year within preschool classrooms, and (2) the amount of within- vs. between-teacher variability in CLASS scores, taking into account trends in quality over time. Based on our findings, we quantify how likely early childhood programs are to be misclassified under the Head Start DRS due to growth and variability of CLASS scores. We conclude with guidance on how to better design accountability policies that rely on such measures.

We leverage rich data on preschool teachers and their observed teaching quality, assessed at multiple time points throughout the preschool year. The data are drawn from the second phase of the National Center for Research on Early Childhood Education (NCRECE) Teacher Professional Development Study. During this phase, preschool teachers in nine socioeconomically diverse cities across the U.S. were randomly assigned to bi-weekly professional development coaching sessions or a control condition. All teachers regularly recorded and uploaded videos of their classroom teaching for the study. Video submissions were then double-coded by certified raters and scored according to the CLASSTM.

Our general analytic approach relies on multi-level linear piecewise regressions to model trends in CLASS scores among 305 teachers over the preschool year. We estimate trends separately for teachers receiving professional development and those operating under business as usual. Next, we explore the extent of within-teacher variability in CLASS scores after accounting for trends in scores over time. Finally, we provide a series of back-of-the-envelope calculations to quantify the implications of trends and variability in CLASS scores for accountability evaluations of Head Start program quality that rely on single-week observations of classrooms.

Our multilevel models show substantial, statistically significant growth in teachers’ Instructional Support CLASS scores over the winter months, both for teachers receiving targeted professional development and not. We also document small, statistically significant declines in teachers’ Emotional Support scores during the fall months. Outside of these trends, there is a substantial amount of within-teacher variability in observed quality. Across two CLASS domains, the ICCs range from 0.33 to 0.32. These do not substantially change after accounting for trends in CLASS scores over the year. For Instructional Support, the ICC is much lower, 0.18, but rises to 0.38 after accounting for documented trends in quality over time. Future analyses will quantify the magnitude of documented CLASS score trends and variability under the Head Start DRS accountability policy.