*Names in bold indicate Presenter
The purpose of this paper, then, is to assess the performance of sequential matching procedures in cases where researchers must choose comparison cases prospectively. We use data from two sources with experimental benchmark results. In the first set of analyses, we use data from the Tennessee Class Size Experiment to match schools first on an array of 1) school composition covariates, 2) geographic distance and urbanicity covariates, and 3) school composition, geographic distance, and urbanicity covariates. We then compare our obtained treatment effect estimates with those from the experimental benchmark to assess how well the matching procedure performed when only aggregate school-level covariates were used. We refine our school-level matches by using Mahalanobis distance or propensity score matching to choose students from our matched comparison schools. The goal here is to assess the extent to which the quality of our matches can be improved when student level data are used. In the second set of analyses, we use data from a large-scale cluster randomized control trial (RCT) of Indiana schools and the Indiana statewide longitudinal data system. We expect that many future applications of sequential matching will draw upon rich covariate information (including pretest scores) from newly launched statewide longitudinal data systems. Thus, the issue here is how well sequential matching procedures when evaluators have access to new school and student-level data systems with rich covariate information.
Our paper will discuss recommendations for how researchers should choose comparison schools and students if random assignment is not possible, and suggest covariates that are likely to be the most important in sequential matching. We also discuss the types of matching procedures that may be used for sequential matching, and diagnostic tests that can be applied to check for overlap and balance. Finally, we show that although Wilde and Hollister (2007) used Project Star data to conclude that propensity score matches could not replicate experimental benchmark results, our study indicates the quality of matches can be improved when important school, geographic, and pretest covariates are applied.