Evaluation for Teacher Development: Exploring the Relationship between Features of Teacher Evaluation Systems and Teacher Improvement

Papay, John; Papay, John

Over the past decade, policymakers have transformed the process of evaluating teachers in our nation’s public schools, seeing improved educator evaluation as a central means of school improvement (McGuinn, 2012; Donaldson & Papay, 2015; Donaldson & Steinberg, 2016; Kraft & Gilmour, 2017). This evaluation reform largely has two distinct (and often competing) goals: (1) holding teachers accountable and (2) improving instructional practice through feedback and support (Papay, 2012; Darling-Hammond, Wise, & Pease, 1983; Hill & Grossman, 2013). These goals have not received equal attention in the policy and research communities. For example, although nearly all states explicitly list professional learning as the goal of evaluation reform, implementation efforts have tended to focus more heavily on efforts to ensure that teachers receive accurate ratings.

The lack of attention to evaluation’s developmental features in the policy arena mirrors the research community’s focus on evaluation as an accountability tool. The lion’s share of quantitative research has explored the statistical properties of and relationships between different teacher effectiveness measures (e.g., Kane, McCaffrey, Miller & Staiger, 2012; Gates Foundation, 2010; Hill et al., 2012). By and large, the literature suggests that these systems, on the whole, produce only limited variation in teacher performance ratings (e.g., Kraft & Gilmour, 2017), which has led analysts to suggest that the substantial investment made in these new evaluation systems over the past decade has not paid off (Dynarski, 2016). However, this substantial body of research has shifted attention from the second purpose of evaluation: to provide guidance and feedback to help support teacher development. There is strong evidence that rigorous evaluation systems can improve the skills of teachers and, as a result, boost student achievement (Taylor & Tyler, 2012; Steinberg & Sartain, 2015; Dee & Wyckoff, 2015). Yet, we know quite little about the specific features of teacher evaluation that lead to improved teacher effectiveness (Donaldson et al., 2014; Firestone et al., 2013).

In this paper, we explore how specific features of the evaluation system, as experienced by teachers in a given school, relate to improvements in teacher effectiveness. We use linked teacher-student data from the state of Tennessee from 2011 to 2015. Of critical importance, we observe detailed micro data on teachers’ observations, including performance ratings on 19 skill-area indicators for most teachers in the state in most years.

We propose an approach to document the developmental effectiveness of such features by using administrative data rather than stakeholder surveys. For example, we develop measures on the quality and quantity of instructional feedback that teachers receive using data derived from the evaluation system itself rather than surveys of teachers about their opinions of such feedback. We then examine how the within-teacher returns to experience vary by teachers’ exposure to these features (e.g., Papay & Kraft, 2015; Kraft & Papay, 2014). We document that teachers who experience more “robust” evaluation improve at greater rates than those who do not. This highlights the central role that evaluation can have in teacher development.

Association for Public Policy Analysis & Management

Panel Paper: Evaluation for Teacher Development: Exploring the Relationship between Features of Teacher Evaluation Systems and Teacher Improvement