Topic Modeling for E-petition Analysis
*Names in bold indicate Presenter
Given the growing volume of e-petitions, the major obstacle to studying their contents is the lack of an efficient method. Traditionally, such content has been analyzed manually, and this analysis has been based on a taxonomy. Unfortunately, manual coding is expensive and labor-intensive; it cannot practically scale up to large volumes of text. For these reasons, topic modeling has emerged as a promising approach to complement manual analysis. Unlike manual coding, topic modeling does not require a taxonomy. Instead, it estimates topics and keywords that reflect those topics based on the distributions of words in the texts. By virtue of being automated, topic modeling can reduce the cost of content analysis and minimize the required manual labor.
We applied topic modeling to petitions that have appeared on the WtP website between September 22, 2011 and November 25, 2014 (3,292 petitions) and produced 40 topics represented by their keywords. Topics derived from these petitions included healthcare, education, gun, election, gender, animal protection, and marijuana policy. To assess the generalizability of topic models to e-petitions, we repeated our process on two subsets of the same set of petitions. We found similar results with small variations. About 30 to 40 percent of topics were identical, which included obama, economy, military, sentence, national holiday, and medical research. Another 30 to 40 percent of topics shared similar themes but referred to idiosyncratic perspectives or events. For example, the first subset included human rights topic represented by keywords of rights, human, political, freedom, civil, and violations, whereas the second subset included two human rights topics representing distinct social events—protest in Hong Kong, and Israel-Palestinian conflicts. Similarly, the first subset showed a gun policy topic, whereas the second included a police use of gun. But, in addition to these shared topics, the first subset highlighted health care, National Security Agency surveillance, and Bowe Bergdahl, whereas the second subset highlighted Islamic State of Iraq and Syria, Michael Brown, and Ebola. Comparison with exogenous events reveals that these topics are meaningfully interrelated with social events. We conclude that topic modeling is an efficient first step in using e-petitions to estimate policy suggestions and opinions from the public occurring at a specific time.