Classroom Context and Measured Teacher Performance: What Do Teacher Observation Scores Really Measure?

Steinberg, Matthew; Steinberg, Matthew

Despite the intense focus on the use of student test scores to gauge teacher performance, the majority of our nation’s teachers receive annual evaluation ratings based primarily on classroom observations (Steinberg & Donaldson, in press). These observation-based performance measures aim to capture teachers’ instructional practice and their ability to structure and maintain high-functioning classroom environments. However, little is known about the ways that classroom context—the settings in which teachers work and the students that they teach—shapes measures of teacher effectiveness based on classroom observations. Given the widespread adoption of high-stakes evaluation systems that rely heavily on classroom observations, it is critical that we have a clearer understanding of how the composition of teachers’ classrooms influences their observation scores.

In this study, we focus on the relationship between the characteristics of a teacher’s class and a teacher’s performance. Using data from the Measures of Effective Teaching (MET) study, we leverage within-teacher variation in classroom composition to estimate the influence that student composition and classroom context have on measured teacher effectiveness, based on the Danielson Framework for Teaching (FFT) observation protocol. This approach allows us to control for fixed teacher quality to better isolate the idiosyncratic influence of classroom context. We further assess if measured teacher performance is more or less sensitive to classroom context when considering specific aspects of classroom management and instruction, and the extent to which the influence of classroom context is different for classroom generalists compared with subject specialists. Moreover, we address the nonrandom sorting of teachers to classes that has been shown elsewhere to bias measures of teacher effectiveness (Rothstein, 2009, 2010). To do so, we leverage the randomization of teachers to classes in the second year of the MET study, relying on the fact that some classrooms to which teachers were randomly assigned fully complied with their random teacher assignment, while other classes experienced partial compliance, and a third group of teachers were nonrandomly assigned to classes (full noncompliers).

We find that teacher performance, based on classroom observation, is significantly influenced by the context in which teachers work. In particular, students’ prior year (i.e., incoming) achievement is positively related to a teacher’s measured performance captured by the FFT. Incoming achievement is more strongly associated with observed ELA instruction than math instruction, and is also concentrated among subject specialists, who work with multiple classes of students within a single school day, compared with classroom generalist teachers who work with the same classroom of students all day, every day. Further, incoming student achievement significantly influences aspects of teacher performance that capture teachers’ interactions with their students, while core instructional practices are not prone to this influence. Finally, we offer evidence that the intentional matching of teachers to classrooms exacerbates the influence of incoming student achievement on measured teacher performance among ELA teachers. Taken together, these results suggest that, in the context of newly implemented teacher evaluation systems, greater caution must be taken when making high-stakes personnel decisions based largely on teachers’ classroom observation scores.

Association for Public Policy Analysis & Management

Panel Paper: Classroom Context and Measured Teacher Performance: What Do Teacher Observation Scores Really Measure?