Panel Paper: How Well Do Teacher Observations Predict Value-Added? Exploring Variability Across Districts

Thursday, November 7, 2013 : 10:05 AM
3015 Madison (Washington Marriott)

*Names in bold indicate Presenter

Pam Grossman1, Julie Cohen1, Matthew Ronfeldt2, Lindsay Brown1, Kathleen Lynch3 and Mark Chin3, (1)Stanford University, (2)University of Michigan, (3)Harvard University
School districts have rapidly adopted value-added models as a mechanism for measuring teachers’ effectiveness (Chetty, Friedman, & Rockoff, 2012). Yet to date, the evidence is mixed on how value-added scores relate to expert observers’ ratings of classroom instruction, with different studies returning markedly different correlations (Hill, 2009). In one view, an important reason for this variability may be the sensitivity of the observational instrument to the test used to generate teacher VAM scores. When observational instruments and student assessments are more aligned, higher correlations may result. In another view, however, differences in VAM-by-observation score correlations may simply be stochastic, the result of unreliably estimated indicators or weak underlying correlations.    

To shed light on this issue we ask: Do observational instruments predict teachers’ value-added equally well across different state tests and district/state contexts? And, to what extent are differences in these correlations a function of the match between the observation instrument and tested content?  

We use data from the Gates Foundation-funded Measures of Effective Teaching (MET) Project (N=1,333) study of elementary and middle school teachers from six large public school districts, and from a smaller (N=250) study of fourth- and fifth-grade math teachers from four large public school districts. We will examine the nature of both ELA and mathematics tests used in these ten districts, using released items to code each for a focus on lower to higher intellectual challenge as well as content-specific indicators (e.g., applied problem solving, making inferences from text); we will also code each test for alignment with the observational metrics described below.  Using the value-added models preferred by MET analysts for both datasets, we will correlate scores from these models with scores of instructional quality as measured by two instruments: the Protocol for Language Arts Teaching Observation (PLATO) instrument (Grossman, Loeb, Cohen, Hammerness, Wyckoff, Boyd, & Lankford, 2010), and the Mathematical Quality of Instruction (MQI) instrument (Hill, Blunk, Charalambous, Lewis, Phelps, Sleep, & Ball, 2008); both are designed to detect more complex forms of classroom instruction. We conduct these analyses by district, and will determine whether the by-district relationship reflects differences in the match between the instrument and the test used to generate teacher VAM scores. 

Early results indicate that estimates of the relationship between teachers’ value-added scores and their observed classroom instructional quality differ considerably by district. These results also suggest that observational instruments better predict student outcomes in districts that employ tests focused more on conceptual knowledge than procedural skills in math. In ELA, observational instruments better predict student outcomes in districts that use tests requiring constructed responses. This finding suggests that the variability in correlations noted above may be a function of the match between the content of the observation instrument and student assessments. If observed in our full analysis, we would recommend districts examine these instruments for alignment.