Panel Paper: Sampling with Administrative Records in the National Survey of Children’s Health

Saturday, November 4, 2017
McCormick (Hyatt Regency Chicago)

*Names in bold indicate Presenter

Scott Albrecht, Jason Fields and Keith Finlay, U.S. Census Bureau


In 2015, the Census Bureau became the data collection agent for the National Survey of Children’s Health. The survey requires national- and state-representative samples of children. The two-phase design uses a screener to identify households with children, and then uses a subsampling process to select a single reference child for topical questions.

In its first year of full production (2016), the Census Bureau used a sample frame of addresses in the Census Bureau’s Master Address File (MAF). To ensure an adequate sample of completed interviews, the Bureau sought to increase sampling efficiency by augmenting the sample frame with a child-present flag built with a variety of administrative records. Tax, program participation and other administrative records were used to link children to specific addresses in the MAF. The augmented sample frame was tested against the most recent year of American Community Survey (ACS) microdata. The two goals of the ACS audit were to identify the child-present rates in each stratum, and to understand how households with children differ in observable ways across the strata. The difference in the child-present rate helped determine the degree to which the flagged stratum was oversampled. The oversampling rate varied across states, but on average, flagged households were six times more likely to be selected than unflagged households. The flag performed as predicted; flagged households were approximately ten times more likely to report children than unflagged households (74.9% versus 7.4%)

In the second year of collection (2017), the Census Bureau will incorporate a modeling and optimization approach to integrate further administrative records into the sample frame. The current plan is to have three strata. The first stratum mirrors the child-present flag from 2016. The remaining addresses (stratum 2 from 2016) are divided into strata 2a and 2b based on a regression model to predict child presence in a sample of American Community Survey (ACS) microdata against a broad set of characteristics in administrative data. Estimated parameters from this model are used to make child-present predictions over the entire MAF. A child-likely stratum (2a) includes those households with the highest probability of child presence; the number of households in 2a is minimized (to maximize sampling efficiency) while maintaining state-specific child coverage constraints. The third stratum (2b) has the lowest probability of child presence and is not sampled.