"Thought I'd Share First": An Analysis of COVID-19 Conspiracy Theories and Misinformation Spread on Twitter

Background: Misinformation spread through social media is a growing problem, and the emergence of COVID-19 has caused an explosion in new activity and renewed focus on the resulting threat to public health. Given this increased visibility, in-depth analysis of COVID-19 misinformation spread is critical to understanding the evolution of ideas with potential negative public health impact. Methods: Using a curated data set of COVID-19 tweets (N ~120 million tweets) spanning late January to early May 2020, we applied methods including regular expression filtering, supervised machine learning, sentiment analysis, geospatial analysis, and dynamic topic modeling to trace the spread of misinformation and to characterize novel features of COVID-19 conspiracy theories. Results: Random forest models for four major misinformation topics provided mixed results, with narrowly-defined conspiracy theories achieving F1 scores of 0.804 and 0.857, while more broad theories performed measurably worse, with scores of 0.654 and 0.347. Despite this, analysis using model-labeled data was beneficial for increasing the proportion of data matching misinformation indicators. We were able to identify distinct increases in negative sentiment, theory-specific trends in geospatial spread, and the evolution of conspiracy theory topics and subtopics over time. Conclusions: COVID-19 related conspiracy theories show that history frequently repeats itself, with the same conspiracy theories being recycled for new situations. We use a combination of supervised learning, unsupervised learning, and natural language processing techniques to look at the evolution of theories over the first four months of the COVID-19 outbreak, how these theories intertwine, and to hypothesize on more effective public health messaging to combat misinformation in online spaces.


Introduction
On December 31, 2019 the World Health Organization (WHO) was made aware of a cluster of cases of 'viral pneumonia' of unknown origin in Wuhan, Hubei Province, China [ 1 ].The WHO reported this cluster via Twitter on January 4, 2020 [ 2 ].On January 19, the WHO Western Pacific Regional Office tweeted evidence of human-to-human transmission [ 3 ].The first US case was reported the next day.Five days later, on January 26, 2020, GreatGameIndia published the article, "Coronavirus Bioweapon-How China Stole Coronavirus From Canada And Weaponized It," which claimed that coronavirus was leaked into China from a Canadian lab [ 4 ].The article was reposted the same day on the website ZeroHedge with the title, "Did China Steal the Coronavirus and Weaponize It?" [ 5 ].The story quickly went viral [ 6 ].
Misinformation surrounding pandemics is not unique to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes coronavirus disease 2019 .At least as far back as the Russian flu of 1889, pandemic spread of misinformation has been concomitant with disease spread [ 7 ].People are susceptible to misinformation , claims of fact that are either demonstrably false based on authoritative sources or unverifiable based on current knowledge [ 8 ], when trust in those authoritative sources is low, which can occur when officials provide conflicting information and guidance [ 9 ].Misinformation must also be differentiated from misperception , an incorrect understanding or interpretation of facts, regardless of the truth of those facts [ 10 ].Nyhan and Reifler (2010) differentiate between those who are uninformed and those who are misinformed; the former are open to corrective information because they are aware of their lack of knowledge while the latter may only become more convinced of their misperceptions with corrective information [ 10 ].
Misinformation will also include conspiracy theories , which posit explanations of events or circumstances based primarily on a conspiracy , small groups of powerful people acting in secret to their own benefit at the expense of another group [ 11 ].While some conspiracies, such as Watergate or the Tuskegee experiments, may eventually be found to be true, the vast majority of conspiracy theories are not true and their spread can undermine public health efforts [ 12 ].Some conspiracy theories may be better classified as disinformation , false or misleading information that is intentionally passed to a target group [ 13 ].Disinformation is a form of "black propaganda," in which the true source of information is concealed [ 14 ].
The COVID-19 outbreak has left many people isolated within their homes, turning to social media for news and to connect with others.This leaves them especially vulnerable to believing and sharing conspiracy theories [ 15 ].This study examines four oft-repeated and long-lived conspiracy theories surrounding COVID-19: the virus is man-made and was released from a laboratory; Bill Gates or the Gates Foundation created or patented the virus; a COVID-19 vaccine will contain a tracking microchip or be otherwise harmful; and that 5G technology is somehow associated with the disease.None of these conspiracy theories are unique, nor are they entirely distinct.Many theories overlap in specific ideas or end results, and many recycle a verifiable "grain of truth" combined with a previously circulating conspiracy message.
5G Cell Towers Spread Coronavirus.Cellular carriers began limited roll-out of 5G cellular service in 2018, with wider coverage available by the end of 2020 [ 16 ].Coverage requires the installation of new cellular towers, which are much smaller than existing 4G towers and need to be placed an average of 500 feet apart [ 17 ].These new towers were already the source of a more general conspiracy theory that the signal degrades the human immune system and that these dangers are being "covered up" by "powerful forces in the telecommunications industry" [ 18 ].The theme of wireless technology causing immune damage in humans was previously seen with 2G, 3G, 4G, and WiFi roll-outs [ 18 ], and even the 1889 Russian flu was purportedly caused by the then-new technology of the electric light [ 7 ].The COVID-related 5G conspiracy theory began in the first week of January, and may not have evolved past a fringe view into a trending hashtag without being shared by fake news websites with the sole aim of spreading conspiracy theories on Twitter or by people aiming to denounce the theory [ 19 ].
Bill Gates/Gates Foundation .Conspiracy theories often "are about accusing powerful people of doing terrible things" [J.Uscinski quoted in 20 ].The Bill & Melinda Gates Foundation is arguably the biggest philanthropic venture ever attempted, targeting global health through "creative capitalism" solutions [ 21 ] that largely bypass existing health systems and local needs [ 22 ].The variety of projects associated with the Bill & Melinda Gates Foundation have proven fertile ground for the development of conspiracy theories, ranging from misinterpretations of a "patent on COVID-19" [ 23 ] to incorporation of vaccine-averse concerns.For example, the Foundation funded research at the Massachusetts Institute of Technology to develop injectable invisible ink to serve as permanent records of vaccination for use in developing countries [ 24,25 ].This technology was announced in December 2019, the same month that SARS-CoV-2 emerged in Wuhan, China and has resulted in a conspiracy theory suggesting the COVID vaccine would microchip individuals with the goal of population control [ 23 ].
Laboratory Origins.In the 1980s the Soviet KGB and the East German Ministry of State Security ("Stasi") created a disinformation campaign known as "Operation Infektion" to promote the idea that HIV was the result of an escaped virus created by the US Army in Ft.Detrick, Maryland [ 26 ].This conspiracy theory was the result of a real conspiracy between Soviet Bloc countries to tarnish the reputation of the US.This story was purposefully planted throughout the world, which perhaps explains why associations between HIV and other infectious diseases consistently re-emerge, including polio [ 27 ], Ebola [ 28 ], and COVID-19.
The COVID-related HIV conspiracy theory began on January 31, 2020 with the preprint publication of "Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag" [ 29 ] , which was quickly retweeted by Anand Ranganathan, a molecular 1 biologist with over 200,000 followers on Twitter, legitimized the conspiracy theory with his scientific reputation by tweeting, "Oh my god.Indian scientists have just found HIV (AIDS) virus-like insertions in the 2019-nCoV virus that are not found in any other coronavirus.They hint at the possibility that this Chinese virus was designed…" [ 30 ]. Within two hours Ross Douthat, a prominent New York Times opinion columnist, retweeted Ranganathan to his 140,000+ followers [ 31 ], further legitimizing the theory through a reputable news outlet and greatly furthering the story's reach outside the scientific community [ 32 ].Three days after the pre-print's initial release, the original paper was retracted [ 29 ] after many scientists discredited the team's overly simplistic methodology.Laboratory origins theories have since become something of a political weapon, with US President Donald Trump claiming to have evidence of a Chinese laboratory origin [ 33 ], prompting a Twitter response from a Chinese government account [ 34 ] that was flagged by Twitter as misinformation [ 35 ].Both of these events signal a transformation from misinformation to disinformation, opening up the potential for government-sponsored propaganda.
Additional laboratory-related conspiracy theories quickly emerged, such as that the virus was created to achieve global population reduction, or to impose quarantines, travel bans, and martial law, all of which were previously seen during the 2014 Ebola outbreak [ 28 ] and the 2015-2016 Zika outbreak [ 36,37 ].
Vaccines.In addition to conspiracy theories surrounding HIV similarities and laboratory origins, the 2014 Ebola outbreak also spawned conspiracy theories surrounding the CDC's supposed patent of the virus and ability to profit off of vaccines while conspiring with American pharmaceutical companies [ 28 ].People have feared vaccines since the smallpox vaccine was first introduced in England in 1796, and conspiracy theories quickly followed [ 38 ].In addition to the genuine conspiracy to disseminate HIV disinformation described above, conspiracy theories surrounding HIV included the ineffectiveness of antiretroviral treatments [ 39 ] and that a cure existed but was being witheld by government and pharmaceutical companies [ 40 ].Vaccine-related social media articles in 2019 still consist of debunked associations with autism and general mistrust of government and the pharmaceutical industry, with most of those who share articles being relatively knowledge-deficient and vaccine-averse compared to non-sharers [ 41 ].Despite representing fewer users, anti-vaccine Facebook pages are more numerous than pro-vaccine pages; are more varied in discussion interests; provide more opportunities for engagement; and are more geographically diverse, ranging from local to global focus [ 42 ].Johnson et al. (2020) predicts that users expressing vaccine-averse opinion on Facebook will overtake pro-vaccine opinion within the next decade due to having many more interlinked pages with rapidly growing follower bases [ 42 ].
With the above framing in mind, this paper offers the following contributions to the literature.First, we use a combination of regular expression-based filtering and supervised learning to identify tweets associated with the four conspiracy theories described above to separate misinformation from other COVID-related tweets.We additionally explore sentiment of tweets and geospatial properties of misinformation tweets.We use an unsupervised learning approach of dynamic topic modeling to explore the changes in word importance among topics within each theory.All of this is done using a large corpus of Twitter data (120 million initial tweets, and 1.8 million that met our initial regular expression filtering step).Lastly, we contextualize these findings within the rich, interdisciplinary literature on misinformation, and health-related content on social media.
Health officials too often fail in crafting effective messaging campaigns because they target what they want to promote rather than addressing the recipients' existing misperceptions [ 43 ].Misinformation can spread rapidly and without clear direction, as evidenced by one tweet (anonymized for privacy) we uncovered while conducting this research, which said "[Article on 5g/coronavirus connection]... Haven't vetted, thought I'd share first."An understanding of the appearance, transmission, and evolution of COVID-19 conspiracy theories can allow public health officials to better craft outreach messaging, and adjust those messages if public perceptions measurably shift.This study aims to demonstrate that identifying and characterizing the most common and long-lived COVID-related conspiracy theories using Twitter data is possible, even when these messages shift in content and tone over time.

Related Work
In the past year, substantial work has emerged investigating the onslaught of misinformation related to COVID-19.Multiple studies have found that misinformation is common, both on social media [ 44-46 ], as well as in the web pages returned for common COVID-19 queries at the beginning of the pandemic [ 47 ].One survey-based study even found that scientists and clinicians think scientific journals without rigorous enough review processes have contributed to misinformation [ 48 ].
Though sparse, there have been a few studies focusing on social media data.In general, work until this point has focused on quantifying the amount of misinformation present in online spaces.One study found that original tweets more often presented false information than evidence-based information, but evidence-based information was more often retweeted [ 46 ].Similarly, Singh et al. found that during the first three months of the outbreak the volume of misinformation tweets was small compared to the overall conversation [ 49 ].Broniatowski et al. conducted a much larger analysis of Twitter data and found that the amount of data related to COVID-19 dwarfed that of other health-related content but found that proportionally more of the data was from credible websites compared to other health-related datasets, with the caveat that it is difficult to assess website credibility [ 45 ].
Separately, some researchers have attempted to characterize how many people believe misinformation and the characteristics of those likely to believe misinformation.Some evidence suggests the proportion of the general population that believe COVID-19 conspiracy theories could be quite high.One nationally representative study in the US found that some myths (e.g., that the virus was created or spread on purpose) were believed by over 30% of respondents [ 50 ].Another study across several countries found that those that believe misinformation are more likely to get information from social media or have a self-perceived minority status [ 51 ].At the same time, the study found characteristics like "trusting scientists" and getting information from the WHO had a negative relationship with belief in misinformation (see [ 51 ] for a full list of covariates considered).
This work builds on prior work by delving into specific conspiracy theories on Twitter from January until May 2020.We use a combination of supervised learning (to identify relevant tweets and analyze sentiment) and unsupervised learning (to characterize the evolution of themes within theories) to understand changes in conspiracy theories over time in a novel fashion.To our knowledge this is the first paper to do a deep analysis of these specific conspiracy theories, or to do so using these methods.

Data 2.1 Twitter Data
The Twitter data used for this study is derived from Chen et al. 2020 [ 52 ], who constructed and made publicly available a COVID-19 Twitter dataset by filtering for known COVID-19 keywords and significant Twitter accounts.Due to limitations in the Twitter API, this data is a 1% sample of tweets that include the keywords, which we rehydrated for use in this study.While the data repository constructed by Chen et al. continues to grow based on lists of known COVID-19 keywords and significant Twitter accounts, the analysis for this paper includes approximately 120 million tweets from January 21, 2020 to May 8, 2020 (see Figure 1).Tweets collected were primarily in English, but also include a variety of other languages [ 52 ].

NewsGuard
Theories listed in NewsGuard's "Special Report: COVID-19 Myths" [ 23 ] were investigated and used to identify the four major conspiracy theories listed in the introduction.NewsGuard provides thorough evaluations of thousands of websites based on criteria including funding transparency, journalistic integrity, and editorial track record.Since the emergence of COVID-19, NewsGuard has also provided a summary of major myths and conspiracy theories associated with the pandemic, earliest documented claims, major events that caused significant spread, and detailed reports of major sources of COVID-19 misinformation.

Filtering & Supervised Classification
We first filtered the data into four datasets using regular expressions (see Figure 2) to increase the number of relevant tweets in the category of interest [ 53-57 ].The four datasets are hereafter referred to using the following terms: • 5G: 5G technology is somehow associated with COVID-19.
• Gates: Bill Gates or the Gates Foundation funded, patented, or otherwise economically benefited from the virus.• Lab: The COVID-19 virus is man-made or bioengineered and was released (intentionally or accidentally) from a laboratory.• Vax: A COVID-19 vaccine would be harmful in a way not supported by science (e.g., it could contain a microchip).Within each keyword-filtered conspiracy theory dataset, we randomly sampled 750 tweets to use for training.Two authors coded each set of tweets and established agreement by jointly coding a subset of tweets (see Table 1).Any tweet promoting or engaging with misinformation, even to refute it, was labeled COVID misinformation.This was done with the rationale that tweeting about misinformation, even in the context of a correction, increases the audience exposed to that misinformation.Inter-rater analysis found relatively high agreement and reasonable Kappa scores (mean 0.759, Table 1).However, the effort demonstrated the difficulty of reliably identifying misinformation; there were many cases where oblique references and jokes fell in a gray area which raters labeled "Uncertain" (~6.1% of coded tweets).To account for this without removing data (and thus shrinking the available training data), we collapsed "uncertain" labels into COVID misinformation or not COVID misinformation and identified which grouping generated higher inter-rater agreement.Results are shown in Table 1.

Table 1. Training data creation inter-rater results.
Tweets were tokenized and both URLs and stop words were removed.Unigrams and bigrams were used as features in a document-term matrix and the most sparse (< 0.05% populated) terms were removed.Additionally, we added boolean features describing relationships to domains identified by NewsGuard as sources of misinformation.Features included (1) a tweet originating from a misinformation-identified domain, (2) a tweet replying to an originating tweet, (3) a tweet retweeting an originating tweet, or (4) a tweet that was otherwise linked (e.g., replying to a retweet of a tweet from a misinformation source).As noted elsewhere, only English tweets were used in this analysis.
The data were partitioned into a two-thirds/one-third training-test split.Data were sampled such that the training data had an equal sample distribution (50% misinformation, 50% non-misinformation).The testing data used the remaining available data and were thus an uneven sample distribution.
Classifiers were built using random forest models with 150 trees, up to 25 terminal nodes, and 25 variables randomly sampled at each split.We used an active learning approach such that after each run of the random forest classifier, the calculated posterior entropy was used to select the three unlabeled tweets that caused the most uncertainty in the model.These were then hand-labeled by an author and applied to the next run of the model.Additionally, for each hand-labeled tweet, highly similar tweets (string similarity >= 0.95) were identified and given the same label.The models which performed the best (measured by F1 score) were used to assign labels to the full dataset.

Theory
Unique

Analysis Sentiment Analysis
Two well-documented sentiment dictionaries were used to label tokenized tweets.The first, AFINN [ 58 ], provided an integer score ranging from -5 (negative sentiment) to +5 (positive sentiment) for each word in the dictionary.The second dictionary, NRC [ 59 ], was used to tag words with categories of emotion, providing labels for eight emotions: anger, anticipation, disgust, fear, joy, sadness, surprise, and trust in addition to an overall "positive" or "negative" sentiment.We then compared the sentiment for each classified dataset over time.For each tweet, aggregate sentiment metrics were calculated, including the sum of integer scores and the counts for each emotion label.

Geospatial Analysis
The OpenStreetMaps geocoding service Nominatim [ 60 ] was used to match Twitter's non-standardized, user-provided location data with geographic coordinates where possible, and was able to locate approximately 50 percent of tweets.A graduated symbol map approach [ 61 ]  was used to identify geospatial clusters by grouping located tweets into ordered quantitative classes, with a maximum cluster radius commensurate with the average size of a major metropolitan area.Finally, each class was assigned a graduated symbol to visualize clusters from smallest to largest.

Dynamic Topic Modeling
Dynamic topic modeling was used to characterize themes and analyze temporal changes in word importance [ 62 ] by dividing tweets into weekly time slices based on the time they were generated.The set of topics at each time slice is then assumed to evolve from the set of topics at the previous time slice using a state space model.The result is an evolving probability distribution of words for each topic that shows how certain words become more or less important over time for the same topic.Traditional topic models, such Latent Dirichlet Allocation [ 63 ], assume that all of the documents (which are here equivalent to tweets) are drawn exchangeably from the same topic distribution, irrespective of the time when they were generated.However, a set of documents generated at different times may reflect evolving topics.
Dynamic topic models were trained for each conspiracy theory, with the number of topics ranging from two to five.Small numbers of topics were chosen because these tweets were already classified to be relevant for individual misinformation topics, and because our goal was to identify potential sub-topics that evolved over time.The optimal number of topics was assessed qualitatively by reviewing topic modeling results.

Filtering & Supervised Classification
Filtering using regular expressions reduced our initial collection of tweets from approximately 120 million to roughly 1.8 million unique tweets across all four conspiracy theories.The relative volume of tweets in each dataset is shown in Figure 3, with the number of tweets appearing in multiple datasets visualized by edge thickness.All datasets showed some degree of overlapping tweets between categories, defined as the proportion of tweets appearing in more than one category, with "Gates" showing the most overlap and "5G" showing the least."5G" additionally had a low volume of tweets compared to other theories.

Table 2. Results of regular expression filtering step. Number tweets per conspiracy theory and overlap between multiple categories.
Model performance metrics for each theory are presented in Table 3.For the theories "5G", "Gates", and "Lab" the balance of misinformation to non-misinformation was roughly equal in the training data.For the vaccine-averse dataset ("Vax"), the split was unbalanced, with only ~18% of labeled training data identified as COVID misinformation.These models were then used on the regular-expression filtered datasets to improve our confidence in the downstream analyses.Figure 3. shows the breakdown of each theory by model-assigned label and what proportion of tweets was composed of COVID Misinformation.

Analysis Sentiment Analysis
Figure 4 shows "Gates"-related tweets by net sentiment score over time.The range in sentiment is significantly greater for COVID misinformation, with tweets more consistently at net -10 sentiment or less, especially in April and May.See appendix for additional figures related to other conspiracy theories.Figure 5 shows the daily average sentiment of tweets in each conspiracy theory subset across eight emotions and general "negative" or "positive" sentiment.While tweets related to "5G" conspiracies are similar between misinformation and non-misinformation, there are clear differences in the other four conspiracy theories.In general, tweets classified as misinformation tend to rate higher on negative sentiment, "fear", "anger" and "disgust" compared to tweets classified as not misinformation.

Geospatial Analysis
Figures 6 and 7 show examples of geospatial analyses for different misinformation theories, with replies and retweets removed.See appendix for additional geospatial renderings of remaining conspiracy topics.
A large percentage of tweets corresponded with major population centers including London, Los Angeles, New York City, Paris, and Washington DC, as expected.However, additional patterns indicate locations with higher engagement with the conspiracy theory.For example, Figure 6 shows discussion of the "5G" conspiracy theory was widespread throughout Europe where this theory was especially prominent.Of note, because this study used English regular expression patterns, results are skewed to English-speaking locations.For each conspiracy theory dataset, dynamic topic modeling was used to identify two to five potential subtopics and understand their evolution over time.The optimal model was assessed qualitatively by reviewing the results.Models with two topics led to the best results (qualitatively coherent topics with the least overlap) for the Gates, 5G, and laboratory origin theories, while the model with three topics was qualitatively the best for the vaccine theory.Results for the Bill and Melinda Gates theory are discussed here, with the remaining theories visualized in the multimedia appendix.

Dynamic Topic Modeling
The Bill and Melinda Gates theory was composed of two topics, distinguishable both in relative volume of subtopic tweets over time (Figure 8) and in evolution of important words over time (Figure 9).Both topics showed peaks of increased Twitter discussion in mid-January to mid-February, with reemergence in April.The initial peaks in Topic 1 corresponded to high weighting of the words "predicted", "kill_65m", "event", and "simulation", while the later spike in April showed higher weights in words like "fauci" and "buttar".The model identified a second topic that referred to several conspiracy theories about Bill Gates, coronavirus, and vaccines.This second topic initially focused on theories about the origins of the virus with highly weighted words like "pirbright", and "patent".In late April, higher weight is given to the words "kennedy", "jr", and "fauci." The vaccine theory showed high weighting for the word "Bakker" in Topic 1 and a brief increase in the word "microchip" in early April within Topic 2. A linguistic shift in referring to the virus was also observable within the vaccine theory, with "coronavirus" highly weighted until mid-March, when "COVID" became more frequently used.
In the laboratory origin theory, words like "biosafety", "biowarfare", "warned", and "laboratory" were more highly weighted early in the outbreak.Words like "escaped", "evidence", and "originated" became more strongly weighted as the theory evolved over time.Overlap was seen between the laboratory theory and the Gates theory, with words such as "kill", "kill_65m", "kill_forget".In addition, we observed terms related to other, more distant theories, such as "ebola" in Topic 2 in mid-January and terms related to Jeffrey Epstein and associated conspiracy theories surrounding his death ("epstein", "forget_epstein") [ 64 ].
The 5G theory showed high weighting of words and phrases like "conspiracy" and "conspiracy theory" in Topic 1 from late March through early April.

Principle Findings
The best performing random forest models were for the "5G" and "Lab" theories, with F1 scores of 0.804 and 0.857, respectively.While results for the "Gates" theory were weaker, "Vax" was the most problematic, with a F1 score of 0.347.This is likely due to the imbalanced nature of the dataset; many of tweets we labeled were associated with scientifically valid news.In most cases, the addition of active learning produced at least minor improvements over the base models.Despite the variance in model performance, this approach was intended to increase the proportion of tweets pertaining to each conspiracy theory, not to provide perfect classification, so results were deemed adequate for downstream analysis.
Within our labeled data, a slight majority of tweets were COVID-19 misinformation in all conspiracy theories aside from the vaccine-averse dataset.However, when looking at overlap between the vaccin-eaverse dataset and the other datasets (see Figure 3), misinformation made up at least half of the overlap in each case, with significant connections between the "Lab", "Gates", and "Vax" conspiracy theories.
Across the conspiracy theories analyzed, this study showed a variety of factors to be consistent.When considering sentiment, the upper bounds (positive sentiment) were similar between misinformation and non-misinformation, but the data consistently showed that misinformation corresponded to a greater range in negative sentiment.We observe a similar trend when considering sentiment categorically.Tweets classified as misinformation consistently averaged higher scores in the categories of "negative", "fear", "anger", and "sadness".These differences were the smallest when comparing misinformation and non-misinformation in the 5G data, despite the fact that this classifier was quite high performing.This could be a result of intense political polarization surrounding the rollout of 5G in Europe, even when discussed outside the context of COVID-19.Geospatial features proved useful in characterizing misinformation, not only in verifying intuitive trends in the data (high volume in population centers) but also in identifying spread in unexpected communities (conspiracy theories gaining adoption in Africa).
Dynamic topic modeling often identified specific conspiracy theories as they went from fringe discussion to larger topics.Early Gates terms of "predicted", "kill_65m", "event", and "simulation" all refer to the simulation of a novel zoonotic coronavirus outbreak at Event 201, a global pandemic exercise co-hosted by the Johns Hopkins Center for Health Security, the World Economic Forum, and the Bill and Melinda Gates Foundation [ 65 ].The simulation predicted a disease outbreak would spread to multiple countries and result in 65 million deaths.These terms were also observed within the first vaccine subtopic.The later Gates spike in April identified the words "fauci" and "buttar", which correspond to news coverage that Dr. Rashid Buttar stated that coronavirus was manufactured to hurt the economy and that Dr. Fauci and Gates were using the pandemic to drive their hidden agendas [ 66 ].Topic 2, which identified early terms of "pirbright", and "patent", corresponds to theories that the Gates Foundation funded and/or patented the virus through the Pirbright Institute, a UK based company.Later terms identified were "kennedy", "jr", and "fauci," corresponding to claims from Robert Kennedy Jr. that a coronavirus vaccine would personally benefit Dr. Fauci and/or Bill Gates.
The vaccine theory identified "bakker" in Topic 1.This refers to the tele-evangelist Jim Bakker who promoted theories about colloidal silver as a cure for COVID-19 on his show and who was given a cease-and-desist letter from the federal government as a result [ 67 ].A brief increase in the word "microchip" occurs a few weeks after the Bill Gates "Ask Me Anything" (AMA) on Reddit [ 68 ], which increased the popularity of theories that the Gates Foundation was attempting to use vaccines to microchip individuals for population control [ 24 ].
Laboratory origin words such as "biosafety", "biowarfare", "warned", and "laboratory" suggest that people were discussing a malicious laboratory release [ 69 ], whereas later words of "escaped", "evidence", and "originated" correspond to theories tending to focus on an accidental release of the virus from a laboratory [ 7 ] (Multimedia Appendix figure 11, topic 2 ) .The laboratory theory also saw a high weight placed on the word "ebola," another disease outbreak that sparked conspiracy theories surrounding bioengineering and laboratory origins.[ 12 ].The laboratory origins topic also captured belief in unrelated conspiracy theories surrounding Jeffrey Epstein.This is consistent with studies which have shown that those who believe in a conspiracy theory are more likely to also believe in others, or are prone to conspiratorial thinking broadly [ 70,71 ].
The 5G theory identified "conspiracy" and "conspiracy theory" in Topic 1, suggesting an increase in people tweeting corrections.This is consistent with Ahmed 2020, which found that approximately 32% of tweets analyzed from March 27, 2020 to May 4, 2020 were criticizing the theory [ 24 ].
Within the dynamic topic modeling results, similar patterns repeat across theories.First, we see consistent evidence of overlap in conspiracy theories.Gates-related terms were seen outside the Gates theory, namely within laboratory origin theories and vaccine theories.The Gates theories are in many ways the quintessential conspiracy theory, identifying a very small group of powerful and wealthy people acting for their own benefit at the expense of the larger population [ 11 ].
Second, we repeatedly see the evolution of conspiracy theories evolving with reference to real-world events as they occur.This is especially evident in the 5G theory, where related hashtags trended on Twitter in the United Kingdom and resulted in suspected arson attacks on 5G towers [ 72 ].We see this pattern in other theories as well.In the Gates conspiracy theory, we were able to tie the overlap between Gates and the vaccine-averse community to his March "Ask Me Anything" on Reddit, which highlighted Gates-funded research at the Massachusetts Institute of Technology to develop injectable invisible ink to serve as permanent records of vaccination for use in developing countries [ 24,25 ].This association morphed into a conspiracy theory suggesting the COVID-19 vaccine would secretly microchip individuals for population control [ 24 ].Further, within the laboratory origin theory, we see evolution from a theory that focuses on the malicious release of COVID [ 69 ] to a more passive and accidental release, and the corresponding connections to political leaders in the United States.Terms like "Trump" and "Pompeo" (the US Secretary of State) gained importance in early May when Pompeo suggested there was evidence linking the release of COVID to a laboratory in Wuhan [ 73 ].

Limitations
One of the difficulties of this work was the active nature of the subject matter.Not only has COVID-19 misinformation continued to spread past our May data cutoff, but emergent conspiracy theories and topics are still being traced back to the early months of the year.For instance, our research into claims of a "laboratory origin" for the virus focused on popular conspiracy theories around a Chinese lab in Wuhan, a Canadian lab, and Fort Detrick in the U.S.However, at the time of writing, two additional theories were identified which have gained traction, one that the virus originated from the French Pasteur Institute and another that it originated in a lab at the University of North Carolina [ 23 ].
Further, our exclusive use of Twitter data fails to capture the entirety of the spread of misinformation.Social media platforms have faced significant challenges in identifying and containing the spread of misinformation throughout the course of the COVID-19 pandemic [ 8 ].While Twitter data can be used to identify links from other platforms, without a means to analyze the source content, the ability to determine when misinformation crosses platforms to and from Twitter is limited.
Twitter users are also known to be a demographically biased sample of the US population [ 74-76 ].In 2012, Twitter users represented 16% of internet users, and were more likely to be aged 18-29 years, African-American, and urban [ 77 ].By 2015, Twitter users represented 23% of all internet users, including 30% of internet users under age 50 [ 76 ].Minority groups are over-represented in some geographic regions and under-represented in others, and geographic regions are also differentially represented, with much of the Midwest significantly under-represented [ 75 ].As technological barriers to internet access decline, user demographics have shifted, with users becoming increasingly more educated and wealthy over time [ 76 ].
Lastly, while we are aware that our study population consisting of a 1% Twitter sampling subset of coronavirus-related keywords and hashtags is not generalizable to the English-speaking U.S. population as a whole, the goal of this study is not to achieve generalizability, but rather to achieve internal validity by accurately categorizing sentiment and describing misinformation patterns within the Twitter population, defined by active Tweeters from January 21, 2020 to May 8, 2020.

Conclusions
We found that an approach combining regular expression filtering and machine learning classification can be used to isolate tweets about four known conspiracy theories.Using this approach, we increased the proportion of tweets relevant to each conspiracy theory, which allowed for downstream sentiment analysis, unsupervised learning, and geospatial analysis with minimal manual components.Combining these methods allowed for a rich description of the evolution of each conspiracy theory over the course of the COVID-19 outbreak from January to May 2020.We show common patterns of theory evolution and conspiracy theory intersection over this time period, and describe how real world events impact the trajectory of conspiracy theories in online spaces.
This approach is valuable for characterizing misinformation that poses concerns to public health and the approach taken here is generalizable to types of misinformation spread in the health domain and beyond.An ability to assess conspiracy theories before they become widespread would allow public health professionals to craft effective messaging to preempt misperceptions rather than to combat established false beliefs.

Future Work
The difficulty in identifying new conspiracy theories, the time sensitivity of their development, and the intensely manual nature of creating labeled data, have shown the limitations imposed by relying on a traditional approach to identification using surveys or hand-labeled training data sets.Future efforts would benefit from an increased focus on unsupervised methods for identification of new misinformation topics.While verification of any new approach would likely still be dependent on hand-labeled data, the relatively short half-life of "expert rated" data when applied to misinformation would suggest the need for unsupervised validation methods as well.
In addition, the methodology described in this paper did not distinguish between tweets promoting misinformation and those refuting it in some way.Prior work has identified both rumor-correcting and rumor-promoting tweets during crises using Twitter data [ 78 ].To promote a more complete view of how misinformation spreads and evolves, future work would benefit from adopting this approach.Here, the x-axis represents time, the y-axis shows important words, and the color represents importance of words with darker color denoting higher importance.Bottom Panel : Word clouds for each topic.Word size corresponds to word weight (larger words have a higher weight).

Figure 1 .
Figure 1.Volume of Twitter data & important events.Twitter data were collected from January 21, 2020 -May 8, 2020.On average, we were able to rehydrate approximately one million tweets per day.Important events in the outbreak are noted for context.

Figure 2 .
Figure 2. Tweet filtering flow.Three filtering stages were applied to increase the proportion of COVID-19 misinformation tweets in the data: (1) Chen et al. [ 52 ] dataset used keywords and known accounts to provide a large sample of COVID-19 Twitter data, (2) Regular expressions were used to create four conspiracy theory datasets ("5G", "Gates", "Lab", and "Vax"), and (3) machine learning classifiers were built for each dataset to automatically identify tweets within each category of COVID misinformation.

Figure 4 .
Figure 4. Sentiment comparison for "Gates" data by label.Color indicates the number of tweets found on that date with corresponding net sentiment score.

Figure 5 .
Figure 5. Sentiment comparison for each theory by classification.Red lines correspond to COVID misinformation and blue lines correspond with tweets labeled not COVID misinformation.Figure5shows the daily average sentiment of tweets in each conspiracy theory subset across eight emotions and general "negative" or "positive" sentiment.While tweets related to "5G" conspiracies are similar between misinformation and non-misinformation, there are clear differences in the other four conspiracy theories.In general, tweets classified as misinformation tend to rate higher on negative sentiment, "fear", "anger" and "disgust" compared to tweets classified as not misinformation.

Figure 7 .
Figure 7. Geospatial rendering of tweets about Gates mandatory vaccine conspiracy theory.

Figure 8 .
Figure 8. Topic distribution (number of tweets) over time for 2-topic dynamic topic model of tweets related to Bill and Melinda Gates.Number of tweets (y-axis) assigned to each topic in the 2-topic dynamic topic model, over time (x-axis).Tweets belonging to topic 1 are more common in the conversation in January, while topic 2 becomes more prominent in the spring.Additionally, there are distinct peaks showing the popularity of tweets related to this conspiracy theory overall.

Figure 9 .
Figure 9. Word cloud and topic evolution for Topics 1 and 2 for tweets related to Bill and Melinda Gates.Top Panel : Word evolution, the change in word importance over time, for tweets related to Bill and Melinda Gates.The x-axis represents time while the y-axis shows important words.Color represents the importance of words, with darker color denoting higher importance.Bottom Panel : Word clouds for each topic.Word size corresponds to word weight (higher weighted words appear larger).

Figure 2 .
Figure 2. Sentiment comparison for "Lab" data by label.Color indicates the number of tweets found on that date with corresponding net sentiment score.

Figure 3 .
Figure 3. Sentiment comparison for "Vax" data by label.Color indicates the number of tweets found on that date with corresponding net sentiment score.

Figure 9 :
Figure 9: Word cloud and topic evolution for topics of tweets related to 5G technology.Top Panel : Word evolution (change in word importance) over time for tweets.Here, the x-axis represents time, the y-axis shows important words, and the color represents importance of words with darker color denoting higher importance.Bottom Panel : Word clouds for each topic.Word size corresponds to word weight (larger words have a higher weight).

Figure 10 :
Figure 10: Word cloud and topic evolution for topics of tweets related to laboratory origins of the virus.Top Panel : Word evolution (change in word importance) over time for tweets.Here, the x-axis represents time, the y-axis shows important words, and the color represents importance of words with darker color denoting higher importance.Bottom Panel : Word clouds for each topic.Word size corresponds to word weight (larger words have a higher weight).

Figure 11 .
Figure 11.Word cloud and topic evolution for topics of tweets related vaccine discussion.Top Panel : Word evolution (change in word importance) over time for tweets.Here, the x-axis represents time, the y-axis shows important words, and the color represents importance of words with darker color denoting higher importance.Bottom Panel : Word clouds for each topic.Word size corresponds to word weight (larger words have a higher weight).

Dataset volume and overlap by theory.
Node size indicates total number of tweets discussing each conspiracy theory, while edge thickness corresponds to the number of tweets discussing any pair of conspiracy theories simultaneously.Color represents the proportion of tweets falling under the associated label.