*Names in bold indicate Presenter
Rather than analyzing the consistency of student test scores over occurrences, the standard approach used by test vendors is to divide the test taken at a single point in time into what is hoped to be parallel parts. Reliability measured with respect to the consistency (i.e., correlation) of students’ scores across these parts only accounts for the measurement error resulting from the random selection of a set of test items from the relevant population of items.
As Feldt and Brennan (1989) note, this approach “frequently present[s] a biased picture” in that “reported reliability coefficients tend to overstate the trustworthiness of educational measurement, and standard errors underestimate within-person variability,” the problem being that measures based on a single test occurrence ignore potentially important day-to-day differences in student performance.
In this paper we show that there is a credible approach for measuring the overall extent of test measurement error that can be applied in a wide variety of settings. Estimation is straightforward and only requires estimates of the correlation or covariance of test scores in the subject of interest at several points in time (e.g., the correlations between third-, fourth- and fifth grade math scores for one cohort of students). Note that one need not have student-level test score data, provided that one has estimates of test-score correlations or covariances. Our approach generalizes the test-retest framework to allow for either growth or decay in the knowledge, skills and abilities of students between the test administrations as well as variation across tests in the extent of measurement error. Utilizing the estimated test-score covariance or correlation matrix and a few assumptions regarding the structure of student achievement growth, it is possible to estimate the overall extent of test measurement error and decompose the variance of test scores into the part attributable to real differences in academic achievement and the part attributable to measurement error.