Published on 20.10.20 in Vol 6, No 4 (2020): Oct-Dec
Preprints (earlier versions) of this paper are available at http://preprints.jmir.org/preprint/21340, first published Jun 11, 2020.
Social Media as an Early Proxy for Social Distancing Indicated by the COVID-19 Reproduction Number: Observational Study
Background: The magnitude and time course of the COVID-19 epidemic in the United States depends on early interventions to reduce the basic reproductive number to below 1. It is imperative, then, to develop methods to actively assess where quarantine measures such as social distancing may be deficient and suppress those potential resurgence nodes as early as possible.
Objective: We ask if social media is an early indicator of public social distancing measures in the United States by investigating its correlation with the time-varying reproduction number (Rt) as compared to social mobility estimates reported from Google and Apple Maps.
Methods: In this observational study, the estimated Rt was obtained for the period between March 5 and April 5, 2020, using the EpiEstim package. Social media activity was assessed using queries of “social distancing” or “#socialdistancing” on Google Trends, Instagram, and Twitter, with social mobility assessed using Apple and Google Maps data. Cross-correlations were performed between Rt and social media activity or mobility for the United States. We used Pearson correlations and the coefficient of determination (ρ) with significance set to P<.05.
Results: Negative correlations were found between Google search interest for “social distancing” and Rt in the United States (P<.001), and between search interest and state-specific Rt for 9 states with the highest COVID-19 cases (P<.001); most states experienced a delay varying between 3-8 days before reaching significance. A negative correlation was seen at a 4-day delay from the start of the Instagram hashtag “#socialdistancing” and at 6 days for Twitter (P<.001). Significant correlations between Rt and social media manifest earlier in time compared to social mobility measures from Google and Apple Maps, with peaks at –6 and –4 days. Meanwhile, changes in social mobility correlated best with Rt at –2 days and +1 day for workplace and grocery/pharmacy, respectively.
Conclusions: Our study demonstrates the potential use of Google Trends, Instagram, and Twitter as epidemiological tools in the assessment of social distancing measures in the United States during the early course of the COVID-19 pandemic. Their correlation and earlier rise and peak in correlative strength with Rt when compared to social mobility may provide proactive insight into whether social distancing efforts are sufficiently enacted. Whether this proves valuable in the creation of more accurate assessments of the early epidemic course is uncertain due to limitations. These limitations include the use of a biased sample that is internet literate with internet access, which may covary with socioeconomic status, education, geography, and age, and the use of subtotal social media mentions of social distancing. Future studies should focus on investigating how social media reactions change during the course of the epidemic, as well as the conversion of social media behavior to actual physical behavior.
JMIR Public Health Surveill 2020;6(4):e21340
Public health measures are the epicenter of global efforts to combat the COVID-19 pandemic . The premise of these measures converges on a central notion: decreasing the basic reproductive number (R0) of the novel coronavirus below 1 to suppress transmission. With an R0 value below 1, the virus can no longer sustainably propagate from one person to another, eventually halting its spread [ ]. The most championed of these efforts is the idea of “social distancing,” or the practice of distancing yourself from others to reduce respiratory droplet transmission, the primary mode of transmission for COVID-19 [ ]. However, social distancing has not been inconsequential, with primary concern to socioeconomic health. Several macroeconomic reports exploring the supply and demand shock of COVID-19 describe that its effects may rival that of the 1918 Spanish Flu and the Great Depression [ ].
Transmission of COVID-19 was first detected in the United States on February 2020, and by mid-March, all 50 states and four US territories had reported cases of COVID-19 . The total number of confirmed cases continued to rise exponentially before this trend was broken in early April. In an effort to slow transmission, several states implemented strict lockdowns, curfews, and business restrictions [ ]. New York Governor Cuomo declared a state of emergency on March 7, and New York City implemented one of the first large-scale lockdowns of schools, temples, and other large gathering places in Rochelle. This further extended to include stay-at-home orders in other areas of New York, California, and Illinois. Restrictions on businesses deemed nonessential were eventually implemented in more than 40 states [ ].
In response, decreases across several economic sectors have been witnessed, leading to financial strain on American households. Over 10 million unemployment claims were filed in the 2 weeks ending on March 28, 2020 ; for reference, the previous peak was at 695,000 claims in October 1982. National-level interventions such as mandated paid time off and a historic US $2 trillion stimulus package (Coronavirus Aid, Relief, and Economic Security Act) were used to mitigate the broad impact of COVID-19 [ ].
Early intervention is ideal for the mitigation of a pandemic’s socioeconomic and health costs, but such potential is often a post hoc discovery. A more practical approach is active scrutiny and revision of the implemented measures, ideally in the early phases of the pandemic’s course . Recent efforts have attempted to quantify social distancing efforts using Google or Apple Maps’ user activity [ , ]. Although these tools accurately reflect social behavior at a point in time, we hypothesize that Google Trends and social media yield earlier actionable insight that can help control the pandemic’s trajectory.
Google Trends and social media (eg, Instagram and Twitter) are used extensively in the scientific literature and have been validated against external reference data sets in numerous public health and health surveillance studies [- ]. With an estimated 35% and 27% of all US citizens using Instagram and Twitter, respectively, on a regular basis and 89.7% of digital users searching on Google, these avenues remain the most practical tools for study [ - ]. Likewise, studies during this pandemic are investigating the utility of social media in the dissemination of preventive health information [ - ], and Twitter recently provided full access for prospective social media data tracking for COVID-19 research. Despite this, their use as epidemiological tools in the assessment of social behavior in early epidemic courses remains to be determined.
In this study, we investigate the use of Google Trends, Instagram, and Twitter as tools for the evaluation of social distancing measures by the public in the early epidemic phase. We first highlight a correlation between social distancing measures as captured by social media and national and state-specific time-varying reproduction number (Rt), an epidemiological estimate of R0 throughout an epidemic. We then compare the correlation of these social media avenues with Rt to the correlation of Google and Apple Maps’ user activity with Rt. We focused on the top nine affected states from the time of writing, April 10, 2020. We collected the most recent social media data using Google Trends, Twitter, and Instagram, and used the updated confirmed cases compiled by the Centers of Disease Control COVID-19 Case Data and John Hopkins Coronavirus Resource Center [, ].
We used Google Trends, Instagram, and Twitter. In addition to their established use in the scientific literature, we also focused on Instagram and Twitter because their demographic overlaps significantly with the public-facing jobs [- ] most likely to be affected by social distancing. Furthermore, a poll conducted by the Morning Consult between March 27 and 30, 2020, reported 88% of Americans between the ages of 30-54 years are practicing social distancing to some extent [ , ], an age range closely resembling Instagram and Twitter’s median ages of 34 and 40, respectively.
The choice to include only the top nine states by COVID-19 incidence was made because lower incidence states yielded insufficient social media and incidence data. When the analysis was run on the bottom nine states by COVID-19 incidence, the results displayed erratic patterns of social distancing search interest with no clear peak and days with no data, suggesting low search volumes; additionally, the Rt displayed large error margins and could not be calculated continuously over the study period.
Google Trends records billions of data points from search terms entered by the public. It then compares the summative search volume of each search query (defined as the exact term entered into Google’s search bar) to the day of highest search volume to yield a search volume index (SVI) score of 1-100. SVI is assigned to each day and represents that day’s relative search frequency. Google Trends contains a geo-filtering feature that allows search data from within the United States or, to be more granular, from specific states.
Google Trends data for the search query “Social Distancing” was collected on April 10, 2020, for March 1, 2020, through April 10, 2020.
Instagram and Twitter
Instagram and Twitter are social networking platforms that can be accessed on a phone app or internet website. As of 2018, there are 107 million Instagram users in the United States. Similarly, as of January 2020, 59 million Twitter users are American, comprising the largest percentage of Twitter’s user base. Together, these social networking services capture a large percentage of the American population [, ].
Unamo search algorithms were used to capture the historical frequency of mentions for the hashtag “#socialdistancing” in the United States on Twitter and Instagram between March 1 and April 10, 2020 .
Calculation of Rt
R0 is the number of individuals infected by a single infected individual during his or her entire infectious periods in a population that is entirely susceptible.
Where κ is the rate at which an exposed individual becomes infectious, β is the probability that a susceptible individual becomes infected upon interaction with an infected individual, λ is the birth rate of susceptible individuals, μ is the per capita natural death rate, and γ is the per capita recovery rate.
The R0 for COVID-19 has varied in value from 1.4 to as high as 11.1 reported from some communities in China and Singapore [, ].
The Rt is an epidemiological estimate of R0 calculated using two variables: (1) the daily incidence of acute respiratory illness onset and (2) the distribution of the serial interval (time interval between symptoms onset in a case and in their infector).
The daily incidence of COVID-19 in the United States was obtained from estimations of symptom onset provided by the Centers for Disease Control and Prevention COVID-19 Case Data, which contains data up to April 5, 2020 . The statewide incidence rate is based on confirmed cases obtained from the John Hopkins Coronavirus Resource Center. The serial interval was obtained using available parametric data computed previously for the initial outbreak of COVID-19 [ ].
We used the R statistical software (R Foundation for Statistical Computing) along with the EpiEstim package to calculate the Rt using the aforementioned parameters for the period of March 5 to April 5, 2020. Rt for the United States and top nine states by confirmed COVID-19 cases was derived for this time period. For subsequent calculations, we included data after the onset of at least 100 confirmed cases in each state, as the Rt prior to that had standard deviations in excess of 0.5.
The Google Trends SVI for “social distancing” was then independently compared to Rt for the top nine affected states (New York, California, Pennsylvania, Massachusetts, New Jersey, Florida, Louisiana, Michigan, and Illinois) and the United States as a whole. Analyses were performed using Pearson correlations with significance set to α<.05 then plotted on logarithmic graphs. Correlations were obtained using raw data and after varying periods of time delay between the Google SVI or social media mentions and changes in Rt.
Cross-correlations for the relationship between Rt and measures of social distancing in the United States were performed, using available data based on Google Maps tracking that measures changes in percent social mobility. This data was available for separate locations, including grocery and pharmacy stores, recreation and retail stores, and workplaces. In addition, the cross-correlations between Rt and “#socialdistancing” mentions on Instagram and Twitter were also performed. The coefficient of determination (ρ2) was calculated and graphed, which represents the strength of the correlation at different time delays between Rt and each of the social mobility and social media measures. The peak of the coefficient of determination for each of these measures were tabulated along with the delay for which the greatest strength of relationship was found.
In, the estimated Rt is shown for the period of February 28 to April 5, 2020, calculated from the number of COVID-19 cases by symptom onset, with a mean serial interval of 3.96 (SD 4.75) days. The shaded error bands are equal to 1 SD of the estimated Rt for each date.
Significant negative correlations were found between the Google SVI for the search query “social distancing” and the Rt between the dates of March 5 and April 5, 2020, in the United States (P<.001). The relationship between estimated Rt and Google SVI is visualized graphically ina. The strength of the correlation reached a peak at 4 days delay from the start of the searches when considering all cases in the United States, with a Pearson correlation coefficient of 0.72 (P<.001).
There was a total of 376,067 “#socialdistancing” mentions on Instagram and 6470 on Twitter in the studied time period. The increase in “#socialdistancing” mentions on Twitter and Instagram predate the appearance of a decrease in Rt seen inb and c. The relationship between Rt and Instagram mentions ( c) is significant and strongest at a 4-day delay (P<.001) from the start of Instagram hashtag “#socialdistancing” mentions. Significance for Twitter is seen only at a 6-day delay (P<.001).
When evaluated by state, New York, New Jersey, Massachusetts, Michigan, Pennsylvania, California, Louisiana, Illinois, and Florida all showed significant negative correlations between “social distancing” SVI and the state-specific Rt (P<.001; refer to). These correlations reached peak significance at different delay periods. Rt for some states such as Massachusetts experienced an early correlation with increasing searches for “social distancing.” Other states such as New York and Louisiana experienced a larger time delay from the start of Google searches to a decrease in Rt at 6 and 8 days, respectively. Most states experienced a delay varying between 3-8 days before reaching peak significance.
Significant correlations between Rt and social media appear to manifest themselves earlier in time when compared to social mobility measures, with peaks at –6 and –4 days for the relationship between Rt and Twitter and Instagram mentions, respectively (P<.001; refer to). Social mobility correlated best with Rt at –2 days and +1 day for workplace and grocery/pharmacy, respectively.
The relationship between Rt and social media or social mobility (P<.001) reaches its strongest point at different delay periods, tabulated in. The increase in social media mentions predates the decrease in Rt the earliest, with a lag time of 4-6 days. Social mobility data also predate the decrease in Rt, although at later times of 0-3 days. also shows the strength of correlation between each of the measures and Rt, represented by ρ2. The strongest correlations are between social mobility data and Rt with comparatively lower correlations between social media and Rt. Google Trends, however, shows a comparable ρ2 with data from Apple Maps but not Google Maps, which exhibits the strongest correlations for all domains except parks.
|Data set||ρ2||Lag (days)|
In our study, we found that increased social distancing mentions on social media correlated with reduced US Rt, with Google Trends correlating with reduced state-specific Rt as well. We also found that the correlation varied when social distancing mentions or search queries were lagged by a few days; this effect depended on the state and social media platform. The delay to reach peak strength discrepancy between Instagram and Twitter is interesting because the reach of Instagram in the United States is much greater, indicating possible time-sensitive influence on behavior imparted by user reach. Why the delay periods differed between states is unclear but may be partly explained by the unequal implementation of top-down public health interventions.
Instagram and Twitter mentions of “#socialdistancing” correlated earlier with reduced COVID-19 Rt in the United States than did social mobility measures from Google and Apple Maps. Interestingly, Twitter showed the earliest correlation with Rt but also has the lowest coefficient of determination. This finding may be explained by the fact that Twitter reaches the smallest user base compared to Instagram or Google Maps. Social media in general exhibited a weaker correlation with Rt. This is expected since social mobility measures directly relate to the density of people congregating in an area, whereas social media is an indirect measure of social distancing and likely represents a smaller proportion of the population. Nonetheless, these findings confirm our hypothesis that social media may serve as earlier indicators of future social behavior.
The idea that lagging social distancing efforts as captured by social media produces significant reductions in Rt implicates a predictive role for social media. This is consistent with the interpretation that Google Trends, Instagram, and Twitter model the dissemination of information that may lead to individual decisions to undergo social distancing. Although the strength of the correlation for social media was found to be weaker than that for social mobility, the value was in the relationship of the correlation to time. Furthermore, the strength of the correlation may improve with subsequent studies using more accurate measures of social distancing in the media to actual social distancing behavior in the public.
An additional interpretation for the significance found in lagging social media mentions is that the delayed drop in Rt is also consistent with the expectation that social distancing is a method of primary prevention, as early practice prevents a future increase in Rt. It is tempting to consider whether these effects depend on the incubation period for COVID-19. About 50% of infected individuals show symptoms by 5 days, and 97.5% by 12 days [, ]. Our study shows that all 9 states exhibited a significantly reduced Rt with an 8-day lag period for social distancing search interest, supporting a quarantine time frame that confidently covers the upper limit of the incubation period. On the contrary, quarantine times closer to the median incubation period of COVID-19 may be insufficient, as only 30% of the states showed significant reductions in Rt when the lag period was shorter than the median incubation period. A parallel can be drawn from these findings, albeit speculatively: there may also be a threshold in this pandemic’s trajectory in the United States before which a termination of social distancing efforts may be too early.
Whether this proves valuable in the creation of more accurate assessments of the early epidemic course is uncertain due to limitations. Limitations of this study are inherent to the use of Google Trends, Instagram, and Twitter because they are presumably indirect measures of public behavior. The data represents a subtotal amount of mentions on Instagram and Twitter, and the study period is short and during the early course of the epidemic where testing and reporting COVID-19 was imperfect. Additionally, we focused on only the top nine states by incidence; although this was an effort to reduce false-positive findings from unreliable low-incidence states, it does introduce barriers to generalizing results to other states. Furthermore, social media may represent a biased sample of those that are internet literate and with access to internet, which may effectively covary with socioeconomic status, education, geography, and age.
Our study demonstrates the utility of Google Trends, Instagram, and Twitter as epidemiological tools in the assessment of social distancing measures in the United States during the early course of the COVID-19 pandemic. Their correlation and earlier rise and peak in correlative strength with Rt when compared to social mobility may provide proactive insight into whether social distancing efforts are sufficiently enacted. Whether these findings translate to the hypothesized clinical value is uncertain due to limitations. Although social media remains a candidate to gauge the success of this containment measure in the early epidemic period, future studies should investigate how social media reactions change during the course of the epidemic and whether these correlation patterns with Rt persist.
Conflicts of Interest
- Pan A, Liu L, Wang C, Guo H, Hao X, Wang Q, et al. Association of public health interventions with the epidemiology of the COVID-19 outbreak in Wuhan, China. JAMA 2020 May 19;323(19):1915-1923 [FREE Full text] [CrossRef] [Medline]
- Liu Y, Gayle AA, Wilder-Smith A, Rocklöv J. The reproductive number of COVID-19 is higher compared to SARS coronavirus. J Travel Med 2020 Mar 13;27(2) [FREE Full text] [CrossRef] [Medline]
- Delen D, Eryarsoy E, Davazdahemami B. No place like home: cross-national data analysis of the efficacy of social distancing during the COVID-19 pandemic. JMIR Public Health Surveill 2020 May 28;6(2):e19862 [FREE Full text] [CrossRef] [Medline]
- Athreya KB, Mather R, Mustre-del-Río J, Sanchez JM. COVID-19 and households’ financial distress: part 2: the spread of COVID-19 and (financial) pre-existing conditions. Federal Reserve Bank of Richmond. 2020 Mar 30. URL: https://www.richmondfed.org/publications/research/ [accessed 2020-04-05]
- CDC COVID-19 Response Team. Geographic differences in COVID-19 cases, deaths, and incidence - United States, February 12-April 7, 2020. MMWR Morb Mortal Wkly Rep 2020 Apr 17;69(15):465-471. [CrossRef] [Medline]
- Lee JC, Mervosh S, Avila Y, Harvey B, Matthews AL, Gamio L, et al. See how all 50 states are reopening (and closing again). The New York Times. 2020 Jun. URL: https://www.nytimes.com/interactive/2020/us/states-reopen-map-coronavirus.html [accessed 2020-08-01]
- Culter D. How will COVID-19 affect the health care economy? JAMA Network. 2020 Apr 09. URL: https://jamanetwork.com/channels/health-forum/fullarticle/2764547 [accessed 2020-04-27]
- H.R.74 - Grant's Law 116th Congress (2019-2020). Library of Congress. 2020 Jan 03. URL: https://www.congress.gov/bill/116th-congress/house-bill/74
- Walensky RP, Del Rio C. From mitigation to containment of the COVID-19 pandemic: putting the SARS-CoV-2 genie back in the bottle. JAMA 2020 May 19;323(19):1889-1890. [CrossRef] [Medline]
- Mobility trends reports. Apple. URL: https://www.apple.com/covid19/mobility [accessed 2020-04-15]
- COVID-19 community mobility reports. Google. URL: https://www.google.com/covid19/mobility [accessed 2020-04-15]
- Park JH, Christman MP, Linos E, Rieder EA. Dermatology on Instagram: an analysis of hashtags. J Drugs Dermatol 2018 Apr 01;17(4):482-484 [FREE Full text] [Medline]
- Dorfman R, Vaca E, Mahmood E, Fine N, Schierle C. Plastic surgery-related hashtag utilization on Instagram: implications for education and marketing. Aesthet Surg J 2018 Feb 15;38(3):332-338. [CrossRef] [Medline]
- Sinnenberg L, Buttenheim AM, Padrez K, Mancheno C, Ungar L, Merchant RM. Twitter as a tool for health research: a systematic review. Am J Public Health 2017 Jan;107(1):e1-e8. [CrossRef] [Medline]
- Bloom R, Amber KT, Hu S, Kirsner R. Google search trends and skin cancer: evaluating the US population's interest in skin cancer and its association with melanoma outcomes. JAMA Dermatol 2015 Aug;151(8):903-905. [CrossRef] [Medline]
- Solano P, Ustulin M, Pizzorno E, Vichi M, Pompili M, Serafini G, et al. A Google-based approach for monitoring suicide risk. Psychiatry Res 2016 Dec 30;246:581-586. [CrossRef] [Medline]
- Moccia M, Palladino R, Falco A, Saccà F, Lanzillo R, Brescia Morra V. Google Trends: new evidence for seasonality of multiple sclerosis. J Neurol Neurosurg Psychiatry 2016 Sep;87(9):1028-1029. [CrossRef] [Medline]
- Lu Y, Zhang L. Social media WeChat infers the development trend of COVID-19. J Infect 2020 Jul;81(1):e82-e83 [FREE Full text] [CrossRef] [Medline]
- Wojcik S, Hughes A. Sizing up Twitter users. Pew Research Center. 2019 Apr 24. URL: https://www.pewresearch.org/internet/2019/04/24/sizing-up-twitter-users/ [accessed 2020-04-12]
- Clement J. Percentage of U.S. adults who use Instagram as of February 2019, by age group. Statista. 2019 Aug. URL: https://www.statista.com/statistics/246199/share-of-us-internet-users-who-use-instagram-by-age-group/ [accessed 2020-04-15]
- Simonsen L, Gog JR, Olson D, Viboud C. Infectious disease surveillance in the big data era: towards faster and locally relevant systems. J Infect Dis 2016 Dec 01;214(suppl_4):S380-S385 [FREE Full text] [CrossRef] [Medline]
- Merchant RM, Lurie N. Social media and emergency preparedness in response to novel coronavirus. JAMA 2020 May 26;323(20):2011-2012. [CrossRef] [Medline]
- Al-Dmour H, Masa'deh R, Salman A, Abuhashesh M, Al-Dmour R. Influence of social media platforms on public health protection against the COVID-19 pandemic via the mediating effects of public health awareness and behavioral changes: integrated model. J Med Internet Res 2020 Aug 19;22(8):e19996 [FREE Full text] [CrossRef] [Medline]
- Previous U.S. COVID-19 case data. Centers for Disease Control and Prevention. 2020. URL: https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/previouscases.html [accessed 2020-04-22]
- Gardner L. Public health: mapping COVID-19. Johns Hopkins Center for Systems Science and Engineering. 2020 Jan 23. URL: https://systems.jhu.edu/research/public-health/ncov/ [accessed 2020-04-26]
- Clement J. Worldwide desktop market share of leading search engines from January 2010 to July 2020. Statista. 2020. URL: https://www.statista.com/statistics/216573/worldwide-market-share-of-search-engines/ [accessed 2020-04-15]
- Unamo. URL: https://unamo.com/ [accessed 2020-04-15]
- Fox S, Rainie L. The web at 25 in the U.S. Pew Research Center. 2014 Feb 27. URL: http://www.pewinternet.org/2014/02/27/the-web-at-25-in-the-u-s/ [accessed 2020-04-12]
- Stepleton K. Unemployment insurance weekly claims. US Department of Labor. 2020 Oct 08. URL: https://www.dol.gov/ui/data.pdf [accessed 2020-04-28]
- Du Z, Xu X, Wu Y, Wang L, Cowling BJ, Meyers LA. Serial interval of COVID-19 among publicly reported confirmed cases. Emerg Infect Dis 2020 Jun;26(6):1341-1343. [CrossRef] [Medline]
- Lauer SA, Grantz KH, Bi Q, Jones FK, Zheng Q, Meredith HR, et al. The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application. Ann Intern Med 2020 May 05;172(9):577-582. [CrossRef]
|Rt: time-varying reproduction number|
|R0: basic reproductive number|
|SVI: search volume index|
Edited by T Sanchez; submitted 11.06.20; peer-reviewed by L Sinnenberg, K Bosh, C Campos-Castillo; comments to author 23.07.20; revised version received 20.08.20; accepted 16.09.20; published 20.10.20
©Joseph Younis, Harvy Freitag, Jeremy S Ruthberg, Jonathan P Romanes, Craig Nielsen, Neil Mehta. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 20.10.2020.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.