Panel Paper: Measuring Teachers' Effectiveness: A Report from Phase 3 of Pennsylvania's Pilot of the Framework for Teaching

Friday, November 13, 2015 : 8:50 AM
Tequesta (Hyatt Regency Miami)

*Names in bold indicate Presenter

Stephen Lipscomb, Jeffrey Terziev and Duncan Chaplin, Mathematica Policy Research
Pennsylvania is among many states that are implementing new tools to evaluate teachers. Under recent Pennsylvania law, half of a teacher’s annual evaluation rating must be based on a measure in which a supervisor judges the quality of the teacher’s professional practices. For this purpose, the Pennsylvania Department of Education (PDE) is utilizing a commonly-used classroom observation tool developed called the Framework for Teaching (FFT). The FFT specifies 22 teaching practices, known as components, on which teachers are rated in one of four performance categories. The components are grouped into four professional practice domains: planning and preparation, the classroom environment, instruction, and professional responsibilities. Evaluation measures based on student achievement comprise the other half of annual teacher performance ratings.

In preparation for statewide implementation of these evaluation measures, PDE conducted a multi-phase Pennsylvania Teacher Evaluation Pilot in a subset of districts from 2010-2011 through 2012-2013. The pilot sought to address the following three research questions related to using the FFT in teacher evaluations:

  • To what extent do FFT scores vary across teachers?
  • To what extent are teachers’ FFT scores internally consistent?
  • To what extent are higher or lower FFT scores indicative of teachers who make larger or smaller contributions to their students’ growth in achievement?

This study addresses these three questions by drawing on pilot evaluation scores obtained during 2012-2013 for nearly 7,000 teachers across more than 250 Pennsylvania school districts, who participated in the third and final pilot phase. Pilot evaluations served only to provide information; they were not used for formal evaluative purposes.

In particular, first we describe the variation across teachers in scores on FFT components and on summary scores by domain and across the entire rubric. Second, we calculate the degree of internal consistency within and across domains based on Cronbach’s alpha. Third, we correlate teachers’ FFT scores (by component, domain, and overall) with their estimated contributions to student achievement growth from a value-added model (VAM). The VAM includes all Pennsylvania teachers with students in tested subjects in grades 4 through 8, and use statewide longitudinal data on all students in those subjects and grades. Fourth, we compare the findings from all three sets of analyses with findings from similar analyses undertaken on a smaller and more narrowly defined teacher sample who participated during 2011-2012 in the phase 2 pilot. Finally, we discuss policy implications of the results.