*Names in bold indicate Presenter
In this paper, I develop methods for improving the accuracy of a statistical match using a unique linked dataset developed by the Census Bureau. The Census Bureau has enabled record linkages between the Current Population Survey (CPS) and federal individual income tax returns by creating an identifier for records in each data source that is unique to each individual. The linked data combine the advantages of administrative and survey data, but because access to that dataset is restricted, a statistical match of tax and survey data is generally still necessary.
One way to improve a statistical match is to match tax returns to likely filers in the survey data. I model the probability that an individual files a return, which updates a methodology proposed in Cilke (1998), based on information found in the CPS. I organize individuals in the CPS into tax units based on their marital status and whether they can claim any dependents. Units that do not contain anyone who matches to a primary or secondary filer in the tax return data are defined as non-filers. I model the likelihood that the unit filed, accounting for demographic characteristics, sources of income, whether there was earned income, and whether any transfers were received. The predicted probability of filing is then incorporated into an algorithm for statistically matching tax return data to CPS-based units.
I evaluate whether including the predicted likelihood of filing improves the accuracy of a statistical match by comparing characteristics of non-filers and filers defined in three ways: as observed in the linked data, from a statistical match without predicted filing, and from a statistical match with predicted filing. The demographic and income characteristics of filers and non-filers from the linked data provide benchmarks for results from a statistical match.
I consider additional refinements to a statistical match from aligning the filing status and number of dependents based on CPS information to what a tax unit claims on its return. These can differ for various reasons—the CPS and tax returns refer to different points in time, a taxpayer may intentionally misreport filing status and the number of dependents to minimize tax liability, and the CPS may not have enough information to determine whether someone qualifies as a dependent. Results from statistically matching CPS-based tax units to tax returns with different demographic characteristics are also evaluated against the benchmark.