Panel Paper:
Observational Evaluation of Teachers: Measuring More Than We Bargained for?
*Names in bold indicate Presenter
Using secondary data from the Measures of Effective Teaching (MET) project, we make progress in disentangling differences in teacher quality from inequities in observational ratings. Funded by the Bill & Melinda Gates Foundation, the MET project collected teacher and student administrative records data, survey data, and observational data during the 2009-10 and 2010-11 academic years. We examined math and English language arts teachers in grades 4-9 who worked in five large urban school districts in the United States.
This study extends prior literature by employing methodological approaches to disentangle differences in teacher quality from inequities in observational ratings. We use teacher-by-year fixed effects to test whether a teacher gets differentially worse ratings in classrooms with more marginalized students than the same teacher in the same year in classrooms with fewer marginalized students. In alternative models, we also include classroom-level VAM scores to examine whether ratings continue to vary by student demographics even after controlling for these measures of classroom-specific teaching quality. Since teacher and student socio-demographic characteristics are often related, we also test whether previously observed relationships between observational ratings and teacher characteristics are explained by student characteristics. In separate analyses, we focus on the subset of teachers that were randomly assigned to classrooms to investigate the role of nonrandom sorting of teachers to students in explaining the relationships we observe between observational evaluations and classroom and teacher characteristics.
Our findings contribute to growing evidence that these ratings seem to measure factors outside of a teacher’s performance or control, including the gender of the teacher and the student population assigned to the teacher. Specifically, the results show that men receive lower ratings, on average, than women. Though prior evidence suggests Black teachers receive lower ratings White teachers, we demonstrate that this is largely explained by differences in classroom composition. Moreover, we provide the strongest evidence to date that teachers in classrooms with high concentrations of Black, Hispanic, male, and low-performing students receive significantly lower observation ratings and that these differences are unlikely to be due to actual differences in teacher quality or teacher-student sorting. Consistent with Whitehurst et al. (2014), the main policy implication of our study is that districts and states consider ways to account or adjust for classroom characteristics when using observational rubrics to evaluate teachers.