Panel Paper: Measuring Students’ Social-Emotional Learning Among California’s CORE Districts: An IRT Modeling Approach

Thursday, November 2, 2017
San Francisco (Hyatt Regency Chicago)

*Names in bold indicate Presenter

Robert H. Meyer, Caroline Y. Wang and Andrew B. Rice, Education Analytics


Research continually demonstrates the importance of students’ social-emotional learning (SEL) skills, such as growth mindset, self management, and grit, for future success, including academic achievement, workforce performance, and well-being. Recently, the Every Student Succeeds Act of 2015 boosted interest in these personal characteristics by requiring state accountability systems include at least one indicator of school quality or student success differing from students’ cognitive abilities.

California’s CORE districts are the first in the country to initiate a large-scale panel survey measuring students’ social-emotional learning skills. It includes 25 items from four domains: self management, growth mindset, self-efficacy, and social awareness. Since the initial piloting in 2014, around 445,000 students in 2014-15 and 430,000 students in 2015-16 in grades 3 through 12 participated in the SEL survey. The SEL measures are incorporated into the school quality indicators for participating districts.

The process of measuring student’s personal characteristics raises many concerns. Additionally, with the limited availability of large scale surveys measuring personal characteristics, no consensus exists on how best to score and report these surveys’ outcomes. Given the many known issues, such as student self-reporting and missing responses, this paper aims to identify the best approach to model and score CORE’s SEL survey. CORE’s SEL survey is unique because the same items are administered to a wide range of students in different grades. Therefore, we are also striving to understand whether students from different grades perceive the same items differently via the insights provided by item response theory (IRT) models.

With a measurement focus, this paper (1) applies three different polytomous IRT models – the partial credit model (PCM; Masters, 1982), the generalized partial credit model (GPCM; Muraki, 1992), and the nominal response model (NRM; Bock, 1972) – to each of the four domains and two most recent administrations of CORE’s SEL survey, both on the overall student sample and on separate samples from each grade level; (2) summarizes items’ psychometric characteristics in each domain; (3) compares items’ functionality across grade levels; (4) compares student outcomes estimated using different polytomous IRT models and the classical approach (i.e., raw mean scores excluding missing); (5) makes suggestions on approaches to modeling and scaling the SEL survey data; (6) identifies items, by grade, that do not contribute positively to measurement of each outcome; and (7) discusses policy implications in using SEL measures among educators, administrators, policy makers, and other stakeholders.

Findings indicate that the NRM fits the data statistically significantly better than GPCM, which fits the data better than PCM. The NRM category boundary discrimination parameters suggest a few items, such as those in the growth mindset domain, do not function well, especially among young students. Students in different grades do perceive the items differently. Overall, IRT modeling provides several advantages over the classical approach, including handling missing responses, recognizing differences in students’ understanding across grades, and providing proper weights in scoring. Policy implications in the adoption of IRT scoring are further discussed.