Making Case Studies More Credible: Matching, Machine Learning and Model Selection

Jelveh, Zubin; Jelveh, Zubin

Many city, state, and national policy initiatives affect a small number of non-randomly chosen units. How to best evaluate these policy shocks is a difficult but important question. The econometric tools most often employed choose comparison groups by matching treatment and control units on pre-intervention data.

In this paper, we present a new paradigm for evaluating small sample interventions. Econometric tools most commonly used for such evaluations – differences-in-differences and its most recent variant, the method of synthetic controls – rely upon the assumption of parallel trends and construct a counterfactual by closely matching treatment and control units on pre-intervention data. The closer the match, the better that identification is presumed to be.

We argue that matching on pre-treatment information is not necessary for identifying a credible counterfactual in a case study. A particular concern is that by matching as closely as possible on pre-treatment trends, researchers can easily “overfit” the model to the available data, constructing a control group on the basis of noise rather than true signal. The result is that models will be underpowered to detect effects, a first-order problem in the case study context. An even more pressing issue is that matching on pre-treatment trends is not the most principled way to construct a control group. Ideally the control group should be constructed to maximize the quality of the counterfactual. We note that the quality of the counterfactual is testable under potentially mild assumptions.

In order to evaluate the quality of a candidate control group, we generate a model to predict the effect of treatment among available control sites, understanding that the effect of treatment should be zero by construction. To address the model selection problem, we note that we wish to select the model that minimizes the mean squared prediction error (MSPE) of the counterfactual of the treated unit in the post-intervention period. However, we cannot directly observe this quantity. We propose to use the MSPE for the post-intervention trends of untreated units as a proxy for the MSPE of the treatment counterfactual and that it be adopted as the primary model selection criterion. This criterion limits the role of researcher discretion, guards against overfitting the data, and provides a clear and principled way to compare the credibility of different candidate models. The suitability of this criterion depends upon the assumption that the model that best predicts the counterfactual trends of the untreated units will also best predict the counterfactual trends of the treated units.

We use simulated data to demonstrate that the proposed model selection criterion outperforms existing model selection criteria. We then propose a new method for drawing causal inferences from case studies using an adaptation of the Super Learner ensemble prediction algorithm to find the combination of models that minimizes the MSPE. We apply the proposed method to several datasets and find that it decreases the MSPE of the post-treatment trends of the untreated units by between 25% and 55% relative to synthetic control methods.

Association for Public Policy Analysis & Management

Panel Paper: Making Case Studies More Credible: Matching, Machine Learning and Model Selection