Paper: The Size and Reliability of Teacher Training Effects In Texas (2012 APPAM Fall Research Conference (November 8

Saturday, November 10, 2012 : 4:10 PM

Salon A (Radisson Plaza Lord Baltimore Hotel)

*Names in bold indicate Presenter

Cynthia Osborne, Paul T. von Hippel, Jane Lincove and Nicholas Mills, University of Texas, Austin

After years of evaluating the effectiveness of individual teachers, policy leaders have recently turned attention to the educator preparation programs (EPPs) in which teachers are trained. As envisioned by the Obama administration (US Department of Education 2011), EPP evaluations should be based on the effectiveness of the teachers graduated by each EPP, with teacher effectiveness measured in terms of student outcomes such as test scores and graduation rates.

While it is easy to understand governments’ interest in EPP evaluation, it is unclear whether EPP accountability will prove to be an effective policy lever. On the one hand, EPPs play an obvious gatekeeping role in our system of education. EPPs’ admissions and graduation standards determine what sorts of people can become teachers, and EPPs’ curricula affect the skills that teachers bring to the classroom. On the other hand, some findings suggest that the program in which a teacher was trained is a very weak predictor of teacher effectiveness. The average differences between teachers recruited from different programs are trivial compared to the differences between the best and worst teachers from the same program (Kane, Rockoff, and Staiger 2006).

We report on an evaluation of the 159 EPPs that are accredited in the state of Texas—including traditional college- and university-based programs as well as online and alternative certification programs run by nonprofits and for-profit corporations. Among other things, the evaluation sought to identify the EPPs whose graduates had high or low value-added scores as estimated from state-required tests. The evaluation focused on teachers in their first, second, and third year after EPP graduation, but data on more experienced teachers were available for comparison.

The statewide evaluation presented challenges that are not encountered in value-added studies that are limited to a single large district. Many Texas EPPs are relatively small, and many of the state’s elementary, middle, and high schools are small as well, particularly in rural areas. New teachers typically work close to the EPP that trained them, so there are limited opportunities to compare graduates of different EPPs within the same school. We estimated EPP effects using a variety of regression, Tobit, and errors-in-variables models that controlled for up to two years of prior test scores, as well as student, classroom, and campus characteristics.

Estimates suggest that most EPP effects are small, and that when an EPP effect appears large, it is often not reliable across grade levels or subjects. For example, an EPP’s graduates can appear highly effective in teaching fourth grade reading, but ineffective in teaching fifth grade reading. EPP effects were more reliable in upper-grade mathematics than in other subjects or grades—yet even in those subjects EPP effects uniquely accounted for a small fraction of the variation in student test scores. Our results suggest that value-added evaluation can offer only limited and uncertain feedback to EPPs, and that feedback to EPPs may be a limited and indirect way to improve teacher quality.

Panel Paper: The Size and Reliability of Teacher Training Effects In Texas