ArXiv Preprint
Background: Misinformation spread through social media is a growing problem,
and the emergence of COVID-19 has caused an explosion in new activity and
renewed focus on the resulting threat to public health. Given this increased
visibility, in-depth analysis of COVID-19 misinformation spread is critical to
understanding the evolution of ideas with potential negative public health
impact.
Methods: Using a curated data set of COVID-19 tweets (N ~120 million tweets)
spanning late January to early May 2020, we applied methods including regular
expression filtering, supervised machine learning, sentiment analysis,
geospatial analysis, and dynamic topic modeling to trace the spread of
misinformation and to characterize novel features of COVID-19 conspiracy
theories.
Results: Random forest models for four major misinformation topics provided
mixed results, with narrowly-defined conspiracy theories achieving F1 scores of
0.804 and 0.857, while more broad theories performed measurably worse, with
scores of 0.654 and 0.347. Despite this, analysis using model-labeled data was
beneficial for increasing the proportion of data matching misinformation
indicators. We were able to identify distinct increases in negative sentiment,
theory-specific trends in geospatial spread, and the evolution of conspiracy
theory topics and subtopics over time.
Conclusions: COVID-19 related conspiracy theories show that history
frequently repeats itself, with the same conspiracy theories being recycled for
new situations. We use a combination of supervised learning, unsupervised
learning, and natural language processing techniques to look at the evolution
of theories over the first four months of the COVID-19 outbreak, how these
theories intertwine, and to hypothesize on more effective public health
messaging to combat misinformation in online spaces.
Dax Gerts, Courtney D. Shelley, Nidhi Parikh, Travis Pitts, Chrysm Watson Ross, Geoffrey Fairchild, Nidia Yadria Vaquera Chavez, Ashlynn R. Daughton
2020-12-14