Panel Paper: Establishing Performance-Level Thresholds for the New Mexico Kindergarten Observation Tool

Thursday, November 7, 2019
Plaza Building: Concourse Level, Governor's Square 12 (Sheraton Denver Downtown)

*Names in bold indicate Presenter

Katie Dahlke, Samantha Neiman, Rui Yang, Briana Garcia, Andrew P. Swanlund, Ryan T. Williams and Kristin Flanagan, American Institutes for Research


New Mexico has developed a multidimensional observational measure of students’ knowledge and skills for use at kindergarten entry—the New Mexico Kindergarten Observation Tool (KOT). The KOT was developed on the basis of their Preschool Observation Tool and together the two measures comprise the Early Childhood Observation Tool (ECOT). The KOT’s primary purpose is to provide kindergarten teachers with information about their students’ knowledge and skills at the beginning of the year to inform their curricular and pedagogical decisions.

Purpose/Research Question:

For each of two psychometrically validated domains (General Knowledge & Skills; Academic, and Learning & Social Skills), the study addressed the following two research questions:

  1. What are the appropriate performance-level categories and definitions to use for criteria-based standards?
  2. What are the appropriate KOT cut scores to use for these performance-level categories?

Data and Methods:

Data sources for standard setting included: (1) expert ratings of items; and (2) data from the 2016 KOT statewide administration, including 19,215 children, within 1,526 teachers, 388 schools, and 108 school districts.

To determine categories and thresholds, the study team convened a panel of eight early childhood education experts. The panel included national experts and local stakeholders who brought diverse perspectives about child development, early childhood contexts, and child subgroups. The expert panel engaged in a series of meetings where they provided feedback on developing and defining performance-level thresholds and ratings of the items with regard to each performance-level category (which were defined as developing, demonstrating, and exceeding foundational knowledge and skills at kindergarten entry).

To calculate cut scores we employed a process where, using the modal rating for each response option for each item, transition points were identified for each item (each KOT item has six rubric rating categories and, therefore, five possible points at which modal ratings can transition from developing to demonstrating and from demonstrating to exceeding); these transition points were mapped onto logit values for each item, aggregated to the domain level, and transformed into raw sum scores for easy interpretation.


Three KOT performance levels were established: developing, demonstrating, and exceeding foundational knowledge and skills at kindergarten entry, and appropriate cut scores were identified to place students into each of these categories. The pace and sequence for developing foundational knowledge and skills varies across children. For some children, kindergarten is their first experience in formalized learning. Thus, students at the developing performance-level rating are still considered ready for kindergarten. Performance levels can, however, promote understanding of how to interpret KOT scores and help determine what students need additional instructional support. Performance-level categories also may support stakeholders in recognizing classrooms, schools, and districts that have more children in need of additional or enhanced opportunities for learning.