Panel Paper: Case Studies and Trends in the Influence of Data Science in Policy and Governance

Saturday, November 10, 2018
Wilson C - Mezz Level (Marriott Wardman Park)

*Names in bold indicate Presenter

Alex C Engler Engler, University of Chicago; Urban Institute


Policy analysis is becoming progressively more data-driven, and an expanded set of data science methods is becoming increasingly pertinent to building knowledge about the world around us. One does not need to look far to find the many examples of this important trend. This talk proposes to discuss a variety of diverse and cutting-edge applications of data science in policy, as well as implications for ethics and teaching.

In responding to natural disasters the USGS, FEMA, and DHS employ remote (e.g. drones), networked, and participatory (e.g. citizen cell phones) sensors for targeted and coordinated deployment of resources. Satellite imagery is generating information when there was none before, tracking the spread of disease (used to fight Ebola and monitor malaria outbreaks), building previously unavailable metrics of poverty, and following international conflicts like the flight of the Rohingya. Meanwhile, the ethical and prudent application of machine learning is allowing us to fight human trafficking and improve biased judicial decisions. Text data, like that from Twitter, can augment surveys to explore social and political opinions.

As the private sector field of data science races to develop new methods and technologies, policy research needs to tread carefully. The predictive outcomes of machine learning are not inherently fair, and need to be rigorously evaluated with a set of tools that are just being developed now. Statistical approaches, like the reliance on the p-value, need to be wholly reconsidered in the circumstance of massive datasets. These, among other issues, warrant rigorous evaluation and discussion to develop new norms. Several of these open questions will be covered.

Further, these changes warrants a reconsideration of the traditional educational approaches of public policy schools. While causal statistical inference methods remain a critical component of data-driven research, purely mathematical economics have become far less valuable than data science skills. The next generation of policy researchers need to learn open-source languages, like R and Python, that not only expand what methods they can use, but that also enable reproducible research and literate programming. As new forms of data become more available, students will need expertise in natural language processing and image processing methods. Further, as datasets increase in size (especially those from private sector sources), large scale analytical tools and distributed computing warrant attention.