Poster Paper:
An Evaluation of Demographic Matching of Criminal History Records in New York State
*Names in bold indicate Presenter
In addition to the challenges associated with demographic information inherent in almost any large databases: name misspellings; name changes; nicknames; transposition of first and last names and numerals; and other typographical errors; criminal records databases must also contend with the incentive of arrestees to provide aliases and other false information. The demographic information associated with arrest records in New York State is whatever was provided to law enforcement personnel at the time of arrest, and that information is linked to an individual identifier (state ID) using fingerprint records.
Typically, for a given group of individuals criminal history data from official sources are obtained through a name-based search rather than a fingerprint-based one (Hinton, 2002; Wright, 1999). The goal of the search is usually to locate an individual’s state ID from the state criminal record database, using a set of demographic information (e.g., name, DOB, SSN, gender, race, ethnicity) as the input. This task can be particularly challenging, because there are often multiple combinations of demographic information associated with a given state ID. As a result, plenty of room for errors exists in name-based searches.
Erroneous attribution of criminal records in individual-level data sets has substantial implications for the results of recidivism research. The present study estimates the extent to which the process of matching criminal records to individual level data sets affects recidivism results. Using a known sample, obtained from fingerprint cards, of 10,000 arrests in New York State, we implement various methods designed to match individuals to their criminal history data. We tested deterministic matching processes that require exact matches in certain fields (e.g., Geerken, 1988; 1994), and probabilistic searches that conduct a fuzzy search of names and returns the result with the highest matching probability. The hit rate and false positive rate changed dramatically when different sets of search criteria were implemented, which demonstrate the potential magnitude for accuracy problems in name-based searches.