Panel Paper: Not so Conservative after All: Exact Matching and Attenuation Bias in Randomized Experiments

Thursday, November 2, 2017
Dusable (Hyatt Regency Chicago)

*Names in bold indicate Presenter

Sarah Tahamont, University of Maryland, Zubin Jelveh, University of Chicago Crime Lab, Shi Yan, Arizona State University, Aaron Chalfin, University of Pennsylvania and Benjamin Hansen, University of Oregon


The increasing availability of administrative data has led to a particularly exciting innovation in public policy research – that of the “low-cost” randomized trial. By linking together administrative datasets, researchers can test the effect of an intervention on a host of outcomes from domains as diverse as criminal justice, education and even health. Scholars are lauded for compiling and combining data from multiple sources, but little attention is paid to how those sources are actually linked together. Indeed the description of this process that is provided in most research papers is typically limited to a footnote – if that.

When a unique identifier is available, linking can, in some cases, be trivial. However, efforts to link data from an experimental intervention to administrative records that track outcomes of interest often require matching datasets without a common unique identifier. Demographic characteristics like name and date of birth are used to match data sets, but errors in matching are inevitable. When scholars are interested in matching to identify outcomes there is often is no prior about what the match rate should be, rendering it difficult to diagnose match quality. For example, if researchers are interested in evaluating whether an employment program reduces the likelihood of arrest, observations who match to the arrest file are considered “arrested” and observations who do not match are considered “not arrested.” Not limited to arrest, this case is applicable to any context in which there is no prior for how many records should match (i.e. hospital utilization, college matriculation, program completion). In order to minimize errors, researchers will often use “exact matching” (retaining an individual only if their name and date of birth matches exactly in two or more datasets) in order to ensure that speculative matches do not lead to errors in the dataset that will be used to evaluate the intervention.

We argue that this “conservative” approach, while seemingly logical, is not optimal and can lead to attenuated estimates of treatment effects and therefore, Type II errors. For rare outcomes and small sample sizes, exact matching is particularly problematic. How can this be? As it turns out, creating stringent character match requirements minimizes false positive matches but maximizes false negative matches which often results in higher total error and, therefore, more attenuated estimates. By contrast, matches performed using machine-learning algorithms (probabilistic matching) tend to minimize total error by allowing for some flexibility in the match.

In the paper, we derive an analytic result for the consequences of the matching error on treatment effect estimation and then provide simulation results to show how the problem varies across different combinations of relevant inputs: total error rate, base rate, and sample size. We proceed with an empirical example that shows the difference between “conservative” naïve-matching strategies and matching using a machine-learning algorithm. We conclude on an optimistic note by showing that we can mitigate the consequences of attenuated estimates using matches derived using machine-learning.