Panel Paper: Teacher Evaluation and Discipline Referrals

Saturday, November 9, 2019
Plaza Building: Concourse Level, Governor's Square 17 (Sheraton Denver Downtown)

*Names in bold indicate Presenter

David D. Liebowitz, Lorna Porter and Dylan Bragg, University of Oregon


In a span of seven years, nearly all U.S. states adopted high-stakes teacher evaluation policies. These policies have as a goal increasing the quality of teacher performance through frequent observation and feedback. We investigate the potential unintended consequences of the introduction of these policies on teachers’ response to students’ class behavior.

We estimate the causal impact of the implementation of high-stakes teacher evaluation policies on the frequency with which students are the subject of an Office Disciplinary Referral (ODR) from their classroom teacher. We hypothesize that, in the 45 states that had implemented major reforms to their evaluation systems between 2009 and 2016, the increased scrutiny experienced by teachers may have led some to be more likely to remove students from their classrooms as a result of perceived misbehavior. Our primary data source is the School-Wide Information System (SWIS) data. These data include records of each educator-recorded behavioral infraction approximately 6,000 schools from 2006-07 to 2017-18. We leverage Steinberg and Donaldson’s (2016) tally of evaluation reforms, extended by Kraft et al. (2019), to fit a two-way fixed effect difference-in-difference model that estimates the impact of high-stakes evaluation policy reform on ODRs. Our first difference is the change in the rate of classroom-based subjective ODRs in locales that experienced the teacher evaluation policy reform. Our second difference is the change in the rate of ODRs in locales that did not (or had not yet) experienced the change. As an improvement over standard state policy variation difference-in-difference estimates that struggle to capture endogenous differences across states, we employ triple-difference estimates in which our third difference is the change in the rate of objective and/or non-classroom based ODRs. Since these types of infractions occur within the same schools and presumably are not influenced by changes in teacher evaluation policy (i.e., students are no more/less likely to bring a knife to school under pre- or post-treatment conditions), we argue that our triple-difference estimates are unbiased by state- or district- policy differences. To further test this approach, we estimate models in which our third difference comes from restricted-use Civil Rights Data Collection measures of suspension rates, which we similarly argue should not be influenced by changes in teacher evaluation policies. We conduct robustness checks for differences in disciplinary referral trends pre-policy implementation and for Goodman-Bacon (2018) early- and late-timing variation weights.

We conclude with exploratory analysis of the potential for school leadership actions to moderate the effect of high-stakes evaluation on discipline outcomes. SWIS schools receive externally-validated ratings on the quality of implementation of their Positive Behavioral Intervention and Supports (PBIS) systems. We model the extent to which teachers in schools with effective systems of behavior support, as captured by these ratings, are or are not impacted by the introduction of high-stakes teacher evaluation policies. Our design is pre-registered in the Registry of Efficacy and Effectiveness Studies (REES #1748). We have just received the SWIS data, so are unable to share results, but anticipate having a complete paper in time for the conference.