Panel Paper: Does the Inclusion of Growth Measures in School Accountability Systems Mitigate Triage Behavior?

Saturday, November 4, 2017
Columbian (Hyatt Regency Chicago)

*Names in bold indicate Presenter

Emily C. Kern, Vanderbilt University

Critics have long expressed concerns that the use of proficiency rates to measure school performance--the main metric under No Child Left Behind (NCLB)—incentivizes schools to focus their resources on students closest to that proficiency line (the so-called “bubble” students). If they do this while diverting resources away from the lowest- and highest-performing students, they have engaged in the strategic behavior known as educational triage. Because this behavior threatens the equity principles of federal education policy, many call for accountability systems to include growth measures to motivate schools to behave more equitably.

Yet the extent to which schools engage in triage is unclear. Quantitative research yields mixed results, with some studies showing bubble students gaining more on end-of-year tests compared to students on the tails (supporting the triage theory) while others showing similarly-sized gains for low-performing students and no negative effect for the highest-performing students. A number of qualitative studies, which describe in great detail how schools use student data to identify bubble students and then explicitly target resources to them, may shed light on why quantitative results are mixed. The teachers engaging in triage used results from local benchmark assessments to identify the bubble students, information not typically included in the administrative datasets used in quantitative analyses. This means the quantitative research may not accurately identify the students that the schools themselves would consider bubble.

This research investigates if schools engaged in triage using data from a large urban district that implemented a benchmark and data-sharing policy from 2009-10 through 2013-14. The district administered math and reading benchmarks three times per year for students in grades 3-8, used that data to project student performance on end-of-year state tests, and shared that information with schools. If these schools engaged in triage, it is likely they used this benchmark data to do so. The analysis exploits a feature of the district benchmark and state test data which divides the test score distribution into four groups and assigns a label to each group. Although the labels provide no information beyond the test score itself, schools may use the labels as shortcuts for identifying the lowest- and highest-performing students. Because of the way the labels were assigned, this analysis utilizes regression discontinuity design, an econometric technique viewed as an especially strong method for making causal inference from observational data under these circumstances.

During the time period of this benchmark policy, the state accountability system shifted from NCLB’s focus on proficiency levels to a waiver system that added a growth measure to school performance metrics, meaning schools were measured on both proficiency and growth. This natural experiment means this analysis can investigate if redesigned accountability systems can mitigate unintended behaviors so students who were theoretically previously neglected now receive more support.

The specific research questions are: (1) To what extent do labels assigned to student benchmark performance influence end-of-year student test scores? and (2) To what degree does this relationship differ under the waiver accountability system?