Imputing Top-Coded Income Data in Longitudinal Surveys

Tan, Li; Tan, Li

The incomes of top earners are typically top-coded in survey data to protect individuals’ identities. Common imputation methods used to recover top-coded income values are limited in several ways when they are applied to longitudinal data. I show that the quality of imputed income values for top earners in longitudinal surveys can be improved significantly by incorporating information from multiple time periods. Moreover, I introduce an innovative, Bayesian imputation method that further improves imputation quality. With a sample of individuals for whom incomes are pseudo top-coded (i.e., in which the exact income figures are accessible but temporarily expunged), I show that the Bayesian imputation method reduces the Root Mean Squared Error of imputed income values by 19-50% relative to standard approaches in the literature. After documenting this improvement in performance, I illustrate the benefits of the Bayesian method for investigating multi-year income inequality.

Association for Public Policy Analysis & Management

Panel Paper: Imputing Top-Coded Income Data in Longitudinal Surveys