Poster Paper: Replication of Large Scale RCT Results with Regression Discontinuity: The Case of the Reading Recovery i3 Scale-up

Thursday, November 8, 2018
Exhibit Hall C - Exhibit Level (Marriott Wardman Park)

*Names in bold indicate Presenter

Henry May, Akisha Jones Sarfo and Alyssa R. Englert, University of Delaware


As part of the 2010 economic stimulus, a $55M Investing in Innovation (i3) Scale-Up grant funded the expansion of Reading Recovery in more than 1,400 schools and provided targeted literacy assistance to over 80,000 students. Results from a randomized controlled trial (RCT) involving over 8,000 of these students in more than 1,200 schools produced rigorous evidence of large immediate impacts of Reading Recovery on students’ reading achievement in first grade (ES between .88 to .90 SDs; Sirinides, Gray & May, 2018). However, the delayed treatment design of the RCT made it impossible to estimate long-term impacts since the control group students received the intervention during the second half of the school year.

Because of this limitation of the RCT, a parallel regression discontinuity (RD) design was implemented within a separate randomly selected sample of Reading Recovery schools. Over 20,000 students in 1,200 i3 schools, plus 50,000 students in 3,500 non-i3 schools, participated in the RD study over all four cohort years. The RD design used cutoff-based assignment established by pre-intervention test scores on the Observation Survey of Early Literacy (OS; Clay, 2005). In order to examine the ability of the RD design to replicate the short-term results observed in the RCT, we estimated first grade impacts for both i3 and non-i3 schools implementing the RD design, and we compare these results to i3 RCT results across four cohorts. Using multilevel statistical models, the performance of students above and below the cutoff score was compared, with students nested within each participating school. Model fit and potential misspecification was assessed graphically via scatterplots and spline curves and also by testing for an interaction between pretest scores and the treatment assignment variable. Assumptions of linearity in the RDD analyses were further assessed by testing polynomial terms and by imposing various restrictions on the bandwidth around the cutscore (WWC, 2017). Results showed that RD impact estimates among the i3 school RD estimates ranged from .65 to .78 SDs, and among non-i3 schools ranged from .81 to .84 SDs. These RD estimates are remarkably similar to the RCT estimates (ES between .88 and .90 SDs) and were highly consistent under all model robustness checks.

The similarity of impact estimates between the RCT and RD designs further establishes the utility of the RD design in causal impact evaluations. RD design studies can be used to obtain statistically robust results of program effectiveness in situations when an RCT is not feasible or appropriate. For example, an RD design may alleviate ethical concerns regarding who does and does not receive the treatment, particularly among at-risk students who are often critically in need of the additional supports offered by some interventions. Reducing this controversial element of RCT impact studies allows researchers more opportunity to conduct evaluations in politically-charged educational agencies or in those lacking the additional resources needed to participate in an RCT.