Panel Paper:
Assuring High Quality in Publicly Funded Child Care and Preschool: Replicating Evidence about Two Widely-Used Measures
*Names in bold indicate Presenter
At the same time, both measures were developed for purposes other than their current high stakes use, and it is possible that the rush toward expanding preschool and improving monitoring led to their adoption despite limitations in evidence for current use. Indeed, legislation and regulations have tended to call for “reliable and valid” measures to be used. Contemporary measurement theory takes a more nuanced approach – measures are not themselves “reliable and valid” but rather the body of evidence should be weighed against each potential use of a measure. Decision makers may have to overstep existing evidence in order to meet the more blanket requirements in current rules and policies.
The current paper reports on our efforts to carefully examine the evidence specific to current high stakes use. Because of the importance of replication for practice and policy recommendations, we drew on thirteen large-scale secondary datasets, all of which included the ECERS-R and/or CLASS. The datasets include five cohorts of the Head Start Family and Child Experiences Survey, the Head Start Impact Study, Early Head Start Research and Evaluation Project, Fragile Families and Child Wellbeing Study, Early Childhood Longitudinal Study, Birth Cohort, Preschool Curriculum Evaluation Research Initiative, Quality Interventions for Early Care and Education, NCEDL Multistate Study, and Welfare, Children and Families: A Three-City Study.
In the presentation, we will begin by briefly showing our replication of evidence that associations between scale scores on the ECERS-R or CLASS and child development are often insignificant and uniformly small in size, even when we look across multiple domains of child development and align scale scores more closely with those domains. We will then focus on new results regarding the psychometric properties of each measure which may help explain these small associations. Using Item Response Theory methods, we provide consistent evidence across datasets of category “disorder” in the ECERS-R (that is, a higher score on an item does not always reflect higher quality). We pool the datasets to offer precision in identifying where along the scale scores such category disorder occurs. We also use factor analyses to replicate a recently identified “bi-factor” structure for the CLASS that differs from the subscale structure written into Head Start policy. We discuss the importance of these findings for evidence-based policymaking, including the need to identify alternative scorings for the ECERS-R (and new ECERS-3) and to unpackage the inferential scoring process of the CLASS.