Revisiting the Widget Effect: Teacher Evaluation Reforms and Distribution of Teacher Effectiveness Ratings

Gilmour, Allison; Gilmour, Allison

In 2009, The New Teacher Project (TNTP) characterized the failure of U.S. public education to recognize and respond to differences in teacher effectiveness as “the Widget Effect.” In this paper, we revisit the Widget Effect by examining the degree to which new teacher evaluation systems differentiate among teachers. We ask: Have teacher evaluation reforms resulted in meaningful variation in teacher performance ratings? Does the distribution of teacher performance ratings better reflect perceptions about the true distribution of teacher effectiveness? And, if not, what explains why evaluation reforms have not resulted in greater differentiation in ratings?

We examine these questions with quantitative and qualitative data. We begin by presenting data on the distribution of teacher evaluation ratings across states that have implemented teacher evaluation reforms. We complement these state-level data with a case study of the distribution of teacher evaluation ratings in one large urban school district. We leverage original survey data linked to evaluation records to compare evaluators’ perceptions of the true distribution of teacher effectiveness in their schools with both their predictions of what the ratings distribution will be, and the actual end-of-year ratings. We then discuss findings from in-depth interviews with a random sample of principals in the district that help to explain why differences existed between evaluators’ perceptions, predictions and the actual distribution of teacher effectiveness ratings. Together, these data provide new insights about improving the quality of the teacher workforce through teacher evaluation reforms.

We find that in most states the percentage of teachers rated Unsatisfactory remains less than 1%. However, 3% or more of teachers were rated in categories below Proficient. The full distributions of ratings vary widely across states with 0.7% to 26% rated below Proficient and 3% to 62% rated above Proficient.

Our district case-study examines the degree to which evaluators’ perceptions of the effectiveness of teachers in their schools aligned with the actual performance ratings they assigned. On average, the evaluators estimated that 27.8 percent of all teachers in their schools’ were performing at a level below Proficient. This estimate is more than four times the percentage of teachers who were actually rated below Proficient.

In-depth interviews with principals provide several explanations for why so few teachers receive below Proficient ratings. Principals cited a lack of time as the most frequent reason for not giving a teacher a low rating. Holding evaluators to a higher standard of evidence and follow-up support for teachers rated as below Proficient creates strong incentives for evaluators to rate very few teachers as below Proficient. Several principals reported that they factored in teachers’ potential when assigning an evaluation rating. Others touched on the difficulty of conversations with teachers whom they rated as below Proficient. Some principals remained doubtful that the time and effort spent navigating the dismissal process would result in removing a teacher and finding a better replacement. These findings exemplify Lipsky’s (1980) theory of street-level bureaucracy where policies are ultimately determined by the people who implement them rather than the policymakers who design them.

Association for Public Policy Analysis & Management

Panel Paper: Revisiting the Widget Effect: Teacher Evaluation Reforms and Distribution of Teacher Effectiveness Ratings