*Names in bold indicate Presenter
First, we clarify the set of choices faced by those wishing to construct value-added performance measures and the consequences of these choices in a systematic manner. For example, we discuss the consequences of decisions to use panel versus cross-sectional estimates, whether or not to adjust for student demographics, whether to include peer effects, whether and how to standardize test scores. We evaluate commonly used techniques for estimating teacher effects, such Empirical Bayes shrinkage, variance correction procedures, HLM and Bayesian approaches, instrumental variables, as well as models based on simple categorizations, such as the “Colorado Growth Model.” We assess the advantages and disadvantages of each approach and identify common elements across approaches.
Second, we provide an overview of the sensitivity of value-added performance estimates to assessment-related issues. Among such issues are non-classical IRT-based measurement error, multidimensionality, and floor and ceiling effects. Characteristics of tests related to the discrimination, difficulty, and guessing parameters associated with them can influence value-added measures. In addition, we discuss the nature of variation in performance measures that one can observe using different test instruments.
Third, we discuss techniques that have been developed to diagnose threats to the validity of value-added measures. We evaluate the usefulness of statistical tests, such as the Rothstein “falsification test” and others developed in the econometric literature, and point out their limitations. We also discuss and develop other types of diagnostics techniques to detect nonrandom teacher assignment and the consequences of applying different types of value-added models to different assignment contexts.
Fourth, we discuss remaining unknowns in value-added research—i.e., problems that researchers have not yet addressed or discussed in depth. For example, we currently know very little about the degree to which students and parents respond to particular teacher assignments and few models currently in use address these issues.
In the final section of the paper, we provide practical advice for researchers and policy makers in a series of recommendations for computing and using value-added performance measures going forward.