Panel Paper: Toward Improving Measures of Teacher Effectiveness: Identifying Invalid Reponses in Student Surveys of Teacher Practice

Thursday, November 6, 2014 : 10:15 AM
Aztec (Convention Center)

*Names in bold indicate Presenter

Ryan Balch, Vanderbilt University and Joseph Robinson-Cimpian, University of Illinois, Urbana-Champaign
Though there is widespread evidence that teachers matter, a more challenging problem exists in attempting to measure teacher effectiveness. It can be argued that student feedback is an important consideration in any teacher evaluation system as students have the most contact with teachers and are the direct consumers of a teacher’s service (Kane & Staiger, 2012). As student surveys are increasingly being used as a measure of teacher evaluation systems (e.g. Memphis, Pittsburgh, etc.), a common concern voiced by teachers is that students either may not take the survey seriously or that students may even use the survey as a means of retribution towards a teacher. These concerns are important to address, particularly when student surveys are used in a high-stakes context for teachers. Therefore, it is helpful to identify methods of targeting, and in some cases eliminating, problematic surveys.

This paper introduces and compares multiple screening techniques for identifying, and possibly eliminating, problematic surveys. The five techniques that we compare include (1) identifying patterns of outlier answers based on differences from classroom averages, (2) asking students directly above their level of honesty (e.g., Cornell et al., 2012), (3) requiring a minimum standard deviation in answer choices (e.g., identifying as problematic surveys with all “never”s as responses), (4) testing incongruent answer choices (i.e., predicting a student’s response to an item on the basis of his/her responses to other items, and then identifying as problematic responses that have a low probability of selection), and (5) identifying students who are mischievous responders (i.e., students who find it “funny” to claim that they are something that they are not, such as a student who claims to be in 6th Robinson-Cimpian, in press).

Data for the investigation come from two sources. The first source is student-survey data from a 2011 pilot in seven districts in Georgia with more than 12,000 student responses. The second source is data from Baltimore City Schools collected in a 2013 pilot of student surveys with over 70,000 student responses.

Each screening technique is applied to the data, with an analysis of students that are identified by only one technique as well as those that are flagged through multiple techniques. Further, we compare a teacher’s aggregate student-survey-based evaluations to other measures of teacher evaluation (e.g., value-added scores, principal observations) before and after removing potentially invalid student responses. Thus, the screening techniques are not only compared to each other, they are compared to external measures.

Through introducing and comparing several different techniques to improve the validity of student-survey-based teacher evaluations, this research contributes to the broader literature on improving evaluations of teacher effectiveness.