Panel Paper:
How Do Machine Learning Algorithms Perform in Predicting Hospital Choices? Evidence from Changing Environments
*Names in bold indicate Presenter
In this paper, we evaluate the performance of hospital demand models -- econometric and machine learning -- after major changes in the choice environment. To do this, we use a set of natural disasters that closed one or more hospitals but left the majority of the surrounding area relatively undisturbed. These ``shocks" exogenously altered consumers' choice sets, creating a benchmark -- patients' actual choices in the post-disaster period -- against which to assess the performance of different predictive models calibrated on pre-disaster data. Our main prediction criterion is the fraction of actual choices that we correctly predict using the highest estimated probability as the predicted choice. By comparing the different models' predictions to actual post-disaster choices, we are able to gauge predictive performance when the choice environment has changed.
We present results comparing examples of two classes of machine learning algorithms -- grouping and regularization -- relative to a benchmark econometric choice model akin to those used in recent academic work. We find that the gradient boosting and random forest methods estimated on pre-disaster data generally outperform all other approaches at predicting patient choice after a disaster has closed a hospital. Averaging across all six experiments, the random forest, gradient boosting, and regularization models all correctly predict 46% of choices. By contrast, the benchmark econometric model correctly predicts 40% of choices, while assigning all choices to the highest share hospital in the destroyed hospital's service area correctly predicts 29% of choices.
While we consistently find that the machine learning methods perform best at prediction on average, their relative performance deteriorates for patients who were more likely to have had a major change in their choice set. On average, the relative performance of the machine learning methods over the benchmark conditional logit falls for patients more likely to otherwise have used the destroyed hospital. These results indicate that as data grow sparser, there may be greater scope to complement the data with the researcher's prior domain knowledge on model specification. Therefore, even when focused on prediction, it may be desirable to supplement a machine learning model with a standard parametric one.
Full Paper:
- ml_0619.pdf (1578.4KB)