Panel Paper: Comparing Manual Qualitative Coding to Natural Language Processing: A Case Study of Financial Decision-Making

Friday, November 8, 2019
Plaza Building: Concourse Level, Governor's Square 15 (Sheraton Denver Downtown)

*Names in bold indicate Presenter

Anna Jefferson, Siobhan Mills, Xi Xi and Meaghan Hunt, Abt Associates, Inc.

There is tremendous interest in the applications and value of machine learning to qualitative research but few resources to assess how these new techniques compare to established methods. Machine learning and natural language processing (NLP) have most often been deployed in text analysis projects on scales previously not possible (for example, scraping social media data). What is less well understood is the role that NLP may play in more traditional qualitative research projects, such as interview or focus group studies. An emerging body of literature directly compares manual and machine learning methods (Guetteman et al 2018; Nelson et al 2017; Laurer et al 2018) in the social sciences but public policy research and evaluation is notably absent. This paper contributes to the methodological literature on understanding the benefits, requirements, and outcomes of manual compared to machine learning techniques, extending it into the public policy realm and helping broaden the analytic perspectives of evaluators.

Our case study presents a re-analysis of 32 focus group transcripts about financial decision-making. The original analysis was completed by a team of six qualitative researchers who manually coded each statement from a focus group respondent, matched with their responses to closed-ended survey questions. The original analysis identified key themes within four research areas (credit reports and scores, auto purchases, financial rules of thumb, and comparison shopping for financial products). Within each of the four research areas, we identified five to eight topics and associated sub-topics that became the basis for behavior-based consumer segments. The NLP re-analysis will be completed by four researchers with varying levels of programming expertise and use several machine learning techniques (including unsupervised and supervised methods). We will model topics found in the dataset, evaluating results in comparison to the manual coding in terms of NLP’s benefits, challenges, results (including similarity and depth), and tradeoffs between approaches. In addition to comparing the outcomes of the analyses, we will also present information on the level of effort required by each method and skills required of analysts for both analyses. Together the substantive results of the re-analysis and project management knowledge will help policy researchers and evaluators better understand how to select qualitative analysis methods and how to manage projects using both types of analytic methods.