Evaluating Teacher Preparation Using Observational Data

Campbell, Shanyce; Campbell, Shanyce

Nationally, there is growing interest in evaluating the quality of teacher education programs (TEPs), but little consensus on how best to do this. Motivated by Race to the Top (RTT) incentives, many states have developed systems for evaluating TEPs based upon the average of program completers’ effectiveness at raising student achievement, as measured by value-added models. Among studies linking TEPs to graduates’ value-added, some have found significant and meaningful differences between programs but others have not.

This paper uses one state’s data investigate the use of graduates’ observation ratings as an alternative to value-added for estimating TEP effectiveness. There are a number of possible advantages to this approach. First, observation scores are available for teachers in all subject areas and grade levels. Second, given TEPs aim to improve teachers’ instruction, observational evaluations are likely more sensitive to program impacts than more distal measures like student achievement. By providing information on graduates’ performance in different instructional domains (e.g., planning, classroom management), observational measures can also guide program improvement in ways that value-added measures cannot. Finally, there are growing calls for using multiple measures, including observational ratings.

This paper is guided by the following questions: (1) Are there differences between TEPs in terms of average graduates’ observation ratings? (2) Do TEP ratings vary by modeling approach? (3) How do program ratings that use value-added scores compare?

This paper reports on a secondary analysis of statewide teacher evaluation and administrative data. The sample includes over 9,500 program completers from four cohorts of graduates of all TEPs across the state (2010-2013). To estimate program effects on graduates’ observation ratings, we use hierarchical linear (HLM), school fixed-effects (SFE) and ordinary least squares (OLS) modeling approaches.

Preliminary results suggest there are significant and meaningful differences between TEPs. Graduates from top quartile programs outperform graduates from bottom quartile programs by 0.4 to 0.5 standard deviations on average, comparable to the difference in ratings between first and second year teachers. We have begun to investigate whether similar program differences exist when looking at specific instructional domains instead of aggregate observation scores. Program ratings based upon HLM and SFE approaches are highly correlated (r=0.95); however, correlations with OLS estimates are lower (r=0.7 with both SFE and HLM). We are currently examining whether OLS models are not properly adjusting for the kinds of schools in which graduates gain employment. Finally, we find that programs receiving higher rankings based on observational evaluations also receive higher rankings based on value-added; correlations between rankings are significant but weak to moderate in magnitude (between r=0.23 and 0.45).

This study suggests that observational measures hold promise for evaluating differences between teacher preparation programs. Even so, finding program rankings to vary across modeling approaches suggests that policymakers should think carefully about which approach to use. Given the weak to moderate correlations with value-added rankings, researchers and policymakers also need to consider why rankings differ across these measures and how they should be reconciled.

Association for Public Policy Analysis & Management

Panel Paper: Evaluating Teacher Preparation Using Observational Data