Panel Paper: Optimizing Prediction for Policy Analysis Using Bayesian Model Averaging: Applications to Large-Scale Educational Assessments

Friday, November 9, 2018
Marriott Balcony A - Mezz Level (Marriott Wardman Park)

*Names in bold indicate Presenter

David Kaplan, University of Wisconsin, Madison

The distinctive feature that separates Bayesian statistical inference from its frequentist counterpart is its focus on describing and modeling all forms of uncertainty. The primary focus of uncertainty within the Bayesian framework concerns background knowledge about model parameters. In the Bayesian framework, all unknowns are described by probability distributions designed to encode background knowledge about parameters. And because they are, by definition, unknown, Bayesian inference quantifies background knowledge about parameters in the form of prior distributions.

The Bayesian framework also recognizes that statistical models possess uncertainty insofar as a particular model is typically chosen based on prior knowledge of the problem at hand and the variables used in previous models. This form of uncertainty often goes unnoticed, and the impact of this uncertainty can be quite profound. The current approach to addressing model uncertainty from a Bayesian point of view lies in the method of Bayesian model averaging.

Bayesian model averaging has had a long history of theoretical developments and practical applications (e.g., Clyde, 1999; Clyde & George, 2004; Hoeting, Madigan, Raftery, & Volinsky, 1999; Madigan & Raftery, 1994; Raftery, Madigan, & Hoeting, 1997). Bayesian model averaging has been applied to a variety of domains such as economics (e.g., Fernandez, Ley, & Steele, 2001), bioinformatics of gene expression (e.g., Yeung, Bumgarner, & Raftery, 2005), and weather forecasting (e.g., Sloughter, Gneiting, & Raftery, 2013).

This paper illustrates the predictive performance of Bayesian model averaging applied to statistical models commonly used in the analysis of large-scale educational assessments. The primary motivation for this work likes in the use of these assessments to monitor trends over time in educational outcomes. Specifically, the United Nations Sustainable Development Goal 4, which focuses on quality education for all, and Goal 4.6, which focuses on reducing the global gender gap in literacy and numeracy. Developing optimal predictive models allows researchers and policy makers to assess cross-country progress across multiple countries and forecasts toward that goal. Here, Bayesian model averaging yields models optimized in terms of predictive performance. In other words, Bayesian model averaging yields models that show better out-of-sample predictive performance than any other models in its class.

This paper applies Bayesian model averaging to linear regression and growth curve modeling to data from the OECD Program for International Student Assessment (PISA) and the IEA Trends in International Mathematics and Science Study (TIMSS). Predictive accuracy is examined using a 90% prediction coverage interval as well as the log-score rule for continuous probabilistic forecasts (in the case of regression and SEM), and the Brier score (in the case of logistic regression). The results show that show that Bayesian model averaging yields better predictive performance compared to traditional Bayesian and frequentist procedures according to the scoring rules. The paper closes with a discussion of the implications for educational evaluation and policy analysis.

Full Paper: