Seeking Exceptional Teacher Preparation Programs Among Many Noisy Estimates: The Danger of Being Fooled By Randomness

von Hippel, Paul T.; von Hippel, Paul T.

Sixteen states have taken steps to hold teacher preparation programs (TPPs) accountable for their effects on K-12 test scores, yet TPP effects are very challenging to interpret and use in a policy context. We demonstrate several statistical techniques, some of them new, for estimating the effects on K-12 test scores of the approximately 100 TPPs in Texas. The results present several challenges to using the estimates in policy.

First, the true differences between TPPs—the policy signal—are small; a one standard deviation increase in TPP quality predicts just a .01 to .03 standard deviation increase in student scores. Second, even in a Texas-sized sample, TPP estimates consist primarily of noise, which if mistaken for signal can lead to unnecessary, disruptive, and ineffective policy actions, such as closing an average TPP because its noisy estimate makes it appear much worse than it is. Third, when comparing 100 different TPPs (or even 10), the dangers of multiple comparisons can lead us to infer significant differences between TPPs where no true differences exist—which again can lead to needlessly disruptive interventions.

After adjusting our estimates for multiple comparisons and noise, we find that we usually cannot identify with confidence which TPPs are better or worse than average. We do identify one TPP that appears to be inferior, but its effect is so close to average that closing it would have an unnoticeable effect on student test scores. The potential benefits of TPP accountability programs may be too small to balance the risk that they will promote needlessly disruptive policy actions.

Association for Public Policy Analysis & Management

Panel Paper: Seeking Exceptional Teacher Preparation Programs Among Many Noisy Estimates: The Danger of Being Fooled By Randomness