Panel Paper: The Role of School and District Implementation in Subjective Teacher Evaluations

Thursday, November 2, 2017
Comiskey (Hyatt Regency Chicago)

*Names in bold indicate Presenter

James Cowan1, Dan Goldhaber1,2 and Roddy Theobald1, (1)American Institutes for Research, (2)University of Washington


ESSA did not include requirements that state teacher evaluation systems include student test score measures, a change in policy from the NCLB waivers of the Obama Administration. This shift will likely increase the prominence of classroom observations and other subjective forms of teacher evaluation. Yet, much less is known about the properties of these evaluations than of value added measures. Analyses of classroom observations from the Measures of Effective Teaching Project indicate that these measures may be influenced by classroom composition (Campbell & Ronfeldt, 2016; Gill et al., 2016; Steinberg & Garret, 2016). However, classroom composition is only one potential source of systematic variation in performance evaluations. Beyond differences in the population of students they serve, schools and districts may also differ in the frequency of evaluation, the rigor of rater training, or the standards to which they hold teachers. All of these factors may systematically affect certain populations of teachers, such as those working in disadvantaged schools (Steinberg & Jiang, 2016). In this study, we assess the role of school and districts in subjective teacher evaluations using data from Massachusetts.

The Massachusetts Educator Evaluation framework ensures that local administrators have a significant role in determining teachers’ summative ratings. The framework establishes the evidence that evaluators consider, but it does not provide a rubric for translating observed performance into a final rating. Instead, it explicitly calls for evaluators to use their professional judgment when assessing teachers. The local flexibility in determining how to structure evaluation systems is consistent with how many states have structured their teacher evaluation policies (Doherty & Jacobs, 2015). But it does complicate district- or state-wide uses of evaluation data for high-stakes purposes, such as compensation or licensure. Because some portion of the variation in teacher performance is likely related to local implementation, it is not clear that comparisons of teachers across schools or districts would yield consistent rankings of teacher performance. In fact, districts differ considerably in their use of the highest and lowest rating categories.

We investigate the empirical importance of school and districts using results from Massachusetts’ evaluation system. Using teacher fixed effects models that compare performance ratings for the earned by the same teacher in different years, we replicate prior work showing relationships between average student achievement and teacher evaluations. Variation in average achievement across schools and districts is not predictive of teacher performance ratings. We then document variation in the sensitivity of performance ratings to value-added measures of teacher effectiveness across districts. We regress performance ratings on value added and teacher experience using mixed effects generalized linear models. We find heterogeneity in the relationship between teacher value added and performance ratings. Thus, the probability that high (or low) performing teachers, as measured by value added, receive high (low) evaluations differs considerably across school districts. Our findings have implications for statewide policies that attach high stakes to performance evaluations.