Challenges in the Data Collection and Analysis of Big Data in the Public Sector
*Names in bold indicate Presenter
In this paper we explore a set of issues with public sector use of big data for operations and research that are primarily methodological but also stem from the particular intersection of methods, operational context, and sourcing of big data. Our analysis focuses on four particular areas of concern. First, most big data can be characterized as “digital exhaust” – that is, data generated by commercial and public entities “as a by-product of other activities (Manyika et al., 2011, p. 1).” While cheap and increasingly accessible, digital exhaust is not constructed to standards expected for academic research or even high-quality evidence-based management. We will examine the early evidence on the consequences of relying on digital exhaust, drawing particularly on the case of Google Flu Tracker (Lazer et al 2014).
Second, many sources of big data are “public” but until the Age of Internet were not particularly accessible. As a result, there are substantial concerns regarding public assumptions about privacy and the legitimacy of using public but formerly inaccessible data without some form of enhanced consent. We will consider briefly the case of real estate tract data and how big data is allowing government to create “dossiers” on individuals that may circumvent both constitutional and statutory limits on search and seizure.
Third, while our tools make it increasingly easy to link together disparate data sources, less thought has been given to the potential to easily identify individuals even if the individual sources are deidentified. This issues cross into thorny issues of research and managerial ethics that will be explored, particularly with reference to studies that demonstrate the ease with which identity may be inferred through linked data sets.
Finally, because big data often originates from Internet sources, researchers and public managers must consider carefully the potential for bias in big data that are implicit in the demographics of Internet users and Internet content creators.
This paper will conclude with a set of preliminary thoughts on principles that public sector practitioners and public affairs researchers should consider when acquiring, analyzing, and acting upon big data.