Panel Paper: How Do Machine Learning Algorithms Perform in Predicting Hospital Choices? Evidence from Changing Environments

Thursday, November 7, 2019
I.M Pei Tower: Terrace Level, Terrace (Sheraton Denver Downtown)

*Names in bold indicate Presenter

Devesh Raval, Ted Rosenbaum and Nathan Wilson, Federal Trade Commission


The proliferation of rich consumer-level datasets has led to the rise of the ``algorithmic modeling culture" wherein analysts treat the statistical model as a ``black box" and predict choices using algorithms trained on existing datasets. In most cases, these evaluations of algorithmic prediction have focused on settings where individuals face the same choices over time. However, evaluating policy questions often involves modeling a substantial shift in the choice environment. For example, a health insurance reform may change the set of insurance products that consumers can buy, or a merger may alter the products available in the marketplace. For such questions, it is less obvious whether machine learning methods can usefully be applied.

In this paper, we evaluate the performance of hospital demand models -- econometric and machine learning -- after major changes in the choice environment. To do this, we use a set of natural disasters that closed one or more hospitals but left the majority of the surrounding area relatively undisturbed. These ``shocks" exogenously altered consumers' choice sets, creating a benchmark -- patients' actual choices in the post-disaster period -- against which to assess the performance of different predictive models calibrated on pre-disaster data. Our main prediction criterion is the fraction of actual choices that we correctly predict using the highest estimated probability as the predicted choice. By comparing the different models' predictions to actual post-disaster choices, we are able to gauge predictive performance when the choice environment has changed.

We present results comparing examples of two classes of machine learning algorithms -- grouping and regularization -- relative to a benchmark econometric choice model akin to those used in recent academic work. We find that the gradient boosting and random forest methods estimated on pre-disaster data generally outperform all other approaches at predicting patient choice after a disaster has closed a hospital. Averaging across all six experiments, the random forest, gradient boosting, and regularization models all correctly predict 46% of choices. By contrast, the benchmark econometric model correctly predicts 40% of choices, while assigning all choices to the highest share hospital in the destroyed hospital's service area correctly predicts 29% of choices.

While we consistently find that the machine learning methods perform best at prediction on average, their relative performance deteriorates for patients who were more likely to have had a major change in their choice set. On average, the relative performance of the machine learning methods over the benchmark conditional logit falls for patients more likely to otherwise have used the destroyed hospital. These results indicate that as data grow sparser, there may be greater scope to complement the data with the researcher's prior domain knowledge on model specification. Therefore, even when focused on prediction, it may be desirable to supplement a machine learning model with a standard parametric one.

Full Paper: