Panel Paper: Using Machine Learning to Collect Vaccine Misinformation on Twitter

Monday, July 29, 2019
40.047C - Level 0 (Universitat Pompeu Fabra)

*Names in bold indicate Presenter

Christine Chen, Pardee RAND Graduate School


Vaccine misinformation on social media is considered a contributor to vaccine hesitancy in many pockets of the world, which has led to outbreaks of vaccine preventable diseases. To combat such misinformation, stakeholders need a more complete picture of the production, dissemination, and consumption of misinformation on social media, which cannot be achieved without an efficient and effective data collection mechanism. This study explores the feasibility of using supervised machine learning to collect vaccine misinformation on Twitter. Archived Twitter streaming data from 2017 were queried using vaccine-related keywords. Of which, 1,000 tweets were randomly selected and manually coded by two raters based on whether the content is vaccine-relevant, then whether it is credible, supported by evidence, and propaganda-like. Various machine learning models were trained and evaluated using accuracy, precision, recall, and the area under the precision-recall curve. 25.5%, 35.3% and 31.1% of the tweets were coded as not credible, not supported by evidence, and propaganda with Cohen’s kappa 0.73, 0.74 and 0.55 respectively. The models achieved the highest accuracy (0.798; SD 0.013) with the credibility criterion but the best precision-recall trade-off (AUC 0.756; SD 0.046) with the evidence criterion. Regardless of criteria, support vector machine had one of the highest scores in accuracy (0.786; SD 0.016) and the largest AUC across the models (0.723; SD 0.067). The results supported the feasibility of the approach. Potential improvements can be made via larger training samples, additional data features, alternative data representations and more sophisticated models.