Measuring Educational Attainment As a Continuous Variable: A New Database (1970-2010)

Jorda, Vanesa

In this paper, we introduce a new comprehensive data set on educational attainment and inequality measures of education for 142 countries over the period 1970 to 2010. Most of the previous attempts to measure educational attainment have treated education as a categorical variable, whose mean is computed as a weighted average of the official duration of each cycle and attainment rates, thus omitting differences in educational achievement within levels of education. This aggregation into different groups may result in a loss of information introducing, therefore, a potential source of measurement error. To overcome this potential source of bias, we explore here a more nuanced alternative to estimate educational attainment, which considers the continuous nature of education.

We argue in this paper that a flexible parametric specification may be the most appropriate functional form to estimate educational attainment; more specifically, we propose to employ the generalized gamma (GG) distribution to model the time that individuals attend school until they complete the educational cycle or decide to drop out. This “continuous approach” allows us to impose more plausible assumptions about the distribution of years of schooling within each level of education, and to take into account the right censoring of the data in the estimation, thus leading to less biased estimates of educational attainment and inequality.

The main contribution of this paper resides, therefore, in the development of a new database of educational attainment and inequality measures of educational outcomes, taking into account the distribution of years of schooling within each level of education. After describing our methodology and presenting the main features of our new data set, we then compare our educational attainment estimates with those based on a discrete approach. We found that average educational outcomes estimated using the discrete approach, such as Barro and Lee (2013), are considerably higher than our estimates. In addition, although the correlation between both data sets is fairly high, we find that our series may provide more accurate estimates than those relying on the discrete approach when estimating an aggregate production function using the Mincerian approach to human capital. Similar conclusions were drawn when comparing our educational inequality measures and those provided by Wales et al. (2012), although the correlation between these series in first differences is substantially lower.

In sum, we show how the “discrete approach” figures seem to be extremely sensitive to the assumptions made on the number of years of schooling assigned to the incomplete levels. In contrast, the methodology used in this paper avoids relying on such kind of assumptions about the unknown parts of the distribution. These improved series may be useful to improve our understanding of the role of education on different socio-economic aspects, such as quality of life and human capital formation.

Association for Public Policy Analysis & Management

Panel Paper: Measuring Educational Attainment As a Continuous Variable: A New Database (1970-2010)