Poster Paper: Linking Records Across Administrative Datasets: Can GIS Geocoding Help?

Thursday, November 3, 2016
Columbia Ballroom (Washington Hilton)

*Names in bold indicate Presenter

Randall Juras, Abt Associates, Inc.


Analysis of large administrative datasets offers a promising avenue for conducting low-cost evaluations. Such analyses will often require linking records across two or more datasets, which presents a challenge if there is no common identifier. I explore the possibility of using low-cost GIS geocoding software, which converts address information into latitude and longitude coordinates, to facilitate record linkage. In particular, I compare the effectiveness of deterministic and probabilistic matching strategies, with and without GIS information, for linking worksite-level records across two administrative datasets. In the low-cost deterministic framework, I find that exact matching on (full or partial) geocoded latitude and longitude coordinates offers a substantial improvement over exact matching on name and/or street address. However, both strategies result in error rates that would be unacceptably high for many evaluation purposes. Probabilistic matching—with or without geocoding—significantly outperforms deterministic matching in all cases, but is significantly more expensive. Incorporating geocoded coordinates into probabilistic matching offers little improvement overall, but may help reduce false negative rates when very low false positive error rates are desired.