Panel Paper: A Comparison of Teacher Observation Instruments

Thursday, November 3, 2016 : 10:00 AM
Columbia 4 (Washington Hilton)

*Names in bold indicate Presenter

Brian Gill, Megan Shoji, Thomas Coen and Kate Place, Mathematica Policy Research


Many states and districts are considering adopting commercially available observation instruments for the professional practice component of their evaluation systems. Yet little data are available that would help them choose among available instruments or determine which domains of instruction merit the most emphasis. This study seeks to inform district and state decisions about the selection and use of five widely-used teacher observation instruments: the Classroom Assessment Scoring System (CLASS), the Framework for Teaching (FFT), the Protocol for Language Arts Teaching Observations (PLATO), the Mathematical Quality of Instruction (MQI), and the UTeach Observational Protocol (UTOP).

First, the study uses content analysis of instrument text to assess the major differences and similarities in the dimensions of instruction rated by the five observation instruments. Results indicate that seven of ten domains of instructional practice are common across all five observational instruments, demonstrating the conceptual consistency of large parts of the different instruments. However, instruments also differ in how they measure instructional practice within each of the ten dimensions of instruction. The FFT may offer more comprehensive assessment of instruction than the other instruments as, on average, it provided the greatest coverage of elements within a given dimension.

Second, the study uses existing data on 4th-9th grade math and English language arts (ELA) teachers from the Gates Foundation’s Measures of Effective Teaching (MET) project (Kane and Staiger 2012) to examine whether, across different instruments, observation ratings for some dimensions of instruction consistently show stronger correlations with teachers’ value-added scores than others. Among the seven dimensions of instruction with scores available in the MET data, all were significantly, if modestly, associated with teachers’ value-added scores. Classroom management is the dimension that was most consistently and strongly related to teachers’ value-added scores across instruments, subjects, and grades.

Third, the study capitalizes on the random assignment of teachers to groups of students in the second year of the MET study to test the extent to which characteristics of students in the classroom affect instrument ratings, and whether scores for certain instruments, or dimensions of instruction, are more influenced by student characteristics than others. The findings suggest that ratings may be more susceptible to classroom composition for ELA instruction versus for math instruction; when using the FFT instrument as opposed to the CLASS, PLATO, or MQI instruments; and when considering the fraction of nonwhite students in class rather than the share of low-income students or class-average achievement scores. For two of the three instruments used to score ELA instruction (FFT and CLASS), teaching more nonwhite students reduced teachers observation scores; a similar effect was observed on one of instrument (FFT) for teaching lower-achieving students. There was no evidence that classroom composition affects PLATO scores, and there was little evidence of effects in math classes.

The paper discusses implications for state and district selection of observation instruments.