Panel Paper:
Measuring Students’ Social-Emotional Learning Among California’s CORE Districts: An IRT Modeling Approach
*Names in bold indicate Presenter
California’s CORE districts are the first in the country to initiate a large-scale panel survey measuring students’ social-emotional learning skills. It includes 25 items from four domains: self management, growth mindset, self-efficacy, and social awareness. Since the initial piloting in 2014, around 445,000 students in 2014-15 and 430,000 students in 2015-16 in grades 3 through 12 participated in the SEL survey. The SEL measures are incorporated into the school quality indicators for participating districts.
The process of measuring student’s personal characteristics raises many concerns. Additionally, with the limited availability of large scale surveys measuring personal characteristics, no consensus exists on how best to score and report these surveys’ outcomes. Given the many known issues, such as student self-reporting and missing responses, this paper aims to identify the best approach to model and score CORE’s SEL survey. CORE’s SEL survey is unique because the same items are administered to a wide range of students in different grades. Therefore, we are also striving to understand whether students from different grades perceive the same items differently via the insights provided by item response theory (IRT) models.
With a measurement focus, this paper (1) applies three different polytomous IRT models – the partial credit model (PCM; Masters, 1982), the generalized partial credit model (GPCM; Muraki, 1992), and the nominal response model (NRM; Bock, 1972) – to each of the four domains and two most recent administrations of CORE’s SEL survey, both on the overall student sample and on separate samples from each grade level; (2) summarizes items’ psychometric characteristics in each domain; (3) compares items’ functionality across grade levels; (4) compares student outcomes estimated using different polytomous IRT models and the classical approach (i.e., raw mean scores excluding missing); (5) makes suggestions on approaches to modeling and scaling the SEL survey data; (6) identifies items, by grade, that do not contribute positively to measurement of each outcome; and (7) discusses policy implications in using SEL measures among educators, administrators, policy makers, and other stakeholders.
Findings indicate that the NRM fits the data statistically significantly better than GPCM, which fits the data better than PCM. The NRM category boundary discrimination parameters suggest a few items, such as those in the growth mindset domain, do not function well, especially among young students. Students in different grades do perceive the items differently. Overall, IRT modeling provides several advantages over the classical approach, including handling missing responses, recognizing differences in students’ understanding across grades, and providing proper weights in scoring. Policy implications in the adoption of IRT scoring are further discussed.