Panel Paper: Examining High and Low Value-Added Mathematics Instruction: Can Expert Observers Tell the Difference?

Thursday, November 7, 2013 : 10:25 AM
3015 Madison (Washington Marriott)

*Names in bold indicate Presenter

Heather Hill1, Claire C Gogolen1, Erica Litke1, Andrea Humez1, David Blazar2, Douglas Corey3, Johanna Barmore1, Mark Chin1 and Mary Beisiegel4, (1)Harvard University, (2)Harvard Graduate School of Education, (3)Brigham Young University, (4)Oregon State University
The question of how to measure effective teachers and teaching has long been of interest to policymakers and to school leaders (Fenstermacher & Richardson, 2005; Peterson, 2000; Stodolsky, 1988;). While recent policy initiatives have focused on the use of value-added measures (VAM) to measure practitioners for accountability purposes, there is a much longer tradition of using observations of practice to make such determinations (Brophy & Good, 1986; Cooley & Leinhardt, 1983; Hill, Ball, Sleep & Lewis, 2007). However, empirical evidence suggests these two indicators often identify different sets of teachers as effective. For example, the Measures of Effective Teaching project (2012) finds low correlations between teacher’s VAM scores and their quality of instruction as measured by observational metrics. Studies with the explicit intent of identifying differences in instruction between high- and low-scoring VAM teachers (Grossman et al., in press; Stronge, Ward, and Grant, 2011) have also failed to uncover stark differences between classrooms.

In this study, we use value-added scores and video data in order to mount an exploratory study of high- and low-VAM teachers’ instruction. Specifically, we seek to answer two research questions: First, can expert observers of mathematics instruction distinguish between high- and low-VAM teachers solely by observing their instruction? Second, what instructional practices, if any, consistently characterize high but not low-VAM teacher classrooms?

To answer these questions, we use data generated by 250 fourth- and fifth-grade math teachers and their students in four large public school districts. After ranking teachers within districts using a value-added model that includes both student covariates and data from three separate cohorts (years) of students, an independent analyst identified the three top and three bottom-scoring teachers from each district for further analysis. Observers blind to teacher VAM scores then viewed videotaped lessons (3-7) from each teacher, and qualitatively described and analyzed instruction, noting both mathematical and pedagogical strengths and weaknesses in a series of memos and group discussions. Though these observers were trained raters for the Mathematical Quality of Instruction instrument, they were not formally scoring lessons for the current analysis. Instead, they looked at videos to identify anypractices that may have contributed to high or low value-added rank. They also, for each teacher, predicted the level (high or low) of the teacher’s value-added score.

Preliminary analyses indicate that a teacher’s value-added rank was often not obvious to this team of expert observers. Although several teachers with extreme VAM scores had noticeably strong or weak lessons, most of the instruction was of more moderate quality. In roughly 35% of cases, the observation group incorrectly guessed the teachers’ VAM performance. A second stage of the analysis will search written memos for possible practices or classroom attributes present in high but not low value-added teachers’ classrooms. A third stage of the analysis will explore, through written case studies, teachers for whom VAM rank and classroom observations diverged.