Panel Paper: Using Student Test Scores to Measure Principal Performance

Friday, November 9, 2012 : 8:40 AM
Salon A (Radisson Plaza Lord Baltimore Hotel)

*Names in bold indicate Presenter

Jason A. Grissom1, Susanna Loeb2 and Demetra Kalogrides2, (1)Vanderbilt University, (2)Stanford University

Researchers have devoted substantial attention to the use of student test score data to measure teacher performance. Recently, policymakers have shown interest in using student achievement data to measure the contributions of school administrators as well, as evidenced by recent changes to principal evaluation policies in states like Florida and Louisiana. Yet the use of test scores to measure principal effects has gotten little attention, and the degree to which the research on the measurement of teacher value-added can be applied directly to principals is unclear.

To address this gap in the literature, this paper investigates the capacity of longitudinal achievement data to uncover principal effects. Building on prior research, it describes three conceptually distinct approaches to using longitudinal student achievement data to capture the contributions of principals to test score growth: (1) a school effectiveness model that attributes average school growth to the principal; (2) a relative school effectiveness model that separates principal effects from long-run school effects; and (3) a school improvement model that explicitly allows principal effects to take multiple years to become evident. The paper discusses the different understandings of how principals affect schools implicit in each of these approaches and the potential tradeoffs involved in measuring performance using one approach over another.

In the empirical portion of the paper, we use longitudinal data from Miami-Dade County Public Schools—the nation’s fourth-largest school district—to run variants (e.g., random vs. fixed effects, with and without student fixed effects) of each of the three approaches and compare the results. All three approaches find significant variation in principal effects on schools. However, this exercise makes clear that the choice of model is fundamentally important: across modeling approaches, correlations among principals’ estimated contributions are sometimes very small or even negative. In fact, nearly a third of principals in the highest quartile of performance estimated by the school effectiveness approach are in the lowest quartile when performance is calculated via the school improvement approach. In a step typically not feasible in value-added studies, we next examine the correlations between model predictions and non-test score-based measures of principal performance, such as district performance evaluations or parent climate surveys. We find that these metrics correlate most consistently with the school effectiveness models—whose theoretical properties are the least satisfying—but infrequently with the other models, suggesting that these alternative measures may be better measures of how the school itself is performing than of the principal’s particular contribution.

These results have a variety of implications for policy researchers as well as for state and district policymakers seeking to use student test scores for principal evaluation and compensation. Most importantly, the results show that choice of model matters; low correlations and high reclassification rates across modeling approaches show that principals’ performance evaluations can be quite sensitive to empirical specification. More research on this topic is needed. States and districts should exercise caution in using student test scores for high-stakes decisions concerning principals.