*Names in bold indicate Presenter
The paper estimates the difference in performance using a two-step process. First, it models school choice, using a logistic regression to estimate the probability that an individual student will select public school. Second, an OLS regression is used to estimate the effect of student’s, families’ variables, socioeconomic factors, and municipal level factors, in students’ scores in mathematics and language. The OLS estimation includes a dummy variable equal to 1 if the student attended a public school or 0 otherwise, as well as the predicted probabilities resulting form the first-step estimation. The paper estimates standard errors at the municipal level, and alternative estimations include models with municipal and school fixed effects.
The paper uses a dataset that includes test-scores for all the Colombian students who took the test during 2013 (n=575,823). In addition to the test scores, the dataset includes family and information (such as parents’ educational attainment, parent’s occupation, family income, household characteristics and assets as well as a Colombian economic stratification measure. Additionally, it includes some basic information about the school, such as time of instruction, schools’ schedule, among others. All of these factors are including in both steps of the estimations. Sources of omitted variable bias are discussed, including those related to school characteristics.