*Names in bold indicate Presenter
Two empirical within study comparisons will be used to examine the extent to which an observational study that uses pretest and proxy pretest measures can produce trustworthy causal estimates. For the first within study comparison, the experimental data come from an a large-scale cluster randomized controlled trial (RCT) to study the effect of the state of Indiana’s benchmark assessment system on student achievement as measured on the state’s annual Indiana Statewide Testing for Educational Progress-Plus (ISTEP+) measures. Fifty-nine K-8 schools volunteered to implement the formative assessment system in the 2008-09 school year. Thirty five of these schools were randomly assigned to implement the state’s formative assessment system. The remaining 24 schools were assigned to the control condition. The within study comparison will create a non-experimental comparison group drawing on data from students in K-8 schools statewide.
In the second within study comparison, experimental data are drawn from twenty five high schools in Washington State that participated in an RCT examining the effect of a new science curriculum during the 2009-10 and 2010-2011 school years. Schools assigned to the treatment condition received BSCS Science: An Inquiry Approach, year-long, comprehensive curricular materials, and teachers in these schools participated in a seven day professional development (PD) program. The PD included a four day summer session and three follow up days of training spread across the school year. The comparison schools continued with the science curriculum and PD in place prior to the study. The quasi experimental comparison group will be drawn from high schools in the state of Washington that did not participate in the RCT.
In both cases, the data available for selecting a matched comparison group includes multiple waves of true and proxy pretest measures at both the school and student levels. The study will examine whether using all of these covariates allows for the replication of the experimental benchmark as well as the relative bias reduction associated with one and two waves true and proxy pretest data.