Developments in Methods for Measuring Noncognitive Skills
*Names in bold indicate Presenter
In this presentation I review the limitations of Likert scale ratings, which are their susceptibility to (a) response-style effects, that is, the tendency to select the extremes, the middle, or the positive ends of the response scale, independently of the respondent’s underlying standing on the construct being measured, (b) reference group effects, that is, response dependency on the target of comparison, which varies by respondent, and varies over time, (c) social desirability bias, that is, responding motivated by the desire to present oneself favorably to the assessment administrator, and (d) for ratings by others, halo or horn effects, which reflect the tendency of raters to provide an overall positive or negative assessment without differentiating between different items or dimensions.
I then review research that has demonstrated the value of several alternatives for noncognitive skills measurement which mitigate some of the disadvantages of Likert scale measurement. These include (a) forced-choice (or ranking) methods, (b) anchoring vignettes, (c) situational judgment tests, and (d) performance tasks. Item-response-theory methods for scoring forced-choice (and ranking) tests, in which the respondent ranks items from a block (of size 2 to 4) have shown to yield better correlations with educational and workforce outcomes than Likert scale measures. Anchoring vignettes of personality and climate constructs have been shown in international large-scale educational assessments to increase comparability and increase predictions with achievement. The validity evidence for situational judgment tests (SJTs) has been documented in many contexts, but reliability has been a challenge, requiring longer assessment time to achieve reliable measurement. New item-response-theory methods for scoring SJTs reduce that time making SJTs a more viable assessment instrument. Finally, I discuss one performance-based measure of noncognitive constructs, a collaborative problem-solving tasks, which assesses collaborative skills based on respondent choices and dialogues with collaborators. All these examples are reviewed with sample tasks illustrating the methods, and with data demonstrating their potential advantages over traditional rating-scale measurement. I finish with a discussion about prospects for using these methods in studies by considering what problem the alternative approach is designed to solve, the strength of evidence for it being able to solve that problem, and the costs associated with implementing the alternative solution.
- Kyllonen-Kell-APPAM-2018-November.pdf (594.6KB)