This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.
Infodemiology can offer practical and feasible health research applications through the practice of studying information available on the Web. Google Trends provides publicly accessible information regarding search behaviors in a population, which may be studied and used for health campaign evaluation and disease monitoring. Additional studies examining the use and effectiveness of Google Trends for these purposes remain warranted.
The objective of our study was to explore the use of infodemiology in the context of health campaign evaluation and chronic disease monitoring. It was hypothesized that following a launch of a campaign, there would be an increase in information seeking behavior on the Web. Second, increasing and decreasing disease patterns in a population would be associated with search activity patterns. This study examined 4 different diseases: human immunodeficiency virus (HIV) infection, stroke, colorectal cancer, and marijuana use.
Using Google Trends, relative search volume data were collected throughout the period of February 2004 to January 2015. Campaign information and disease statistics were obtained from governmental publications. Search activity trends were graphed and assessed with disease trends and the campaign interval. Pearson product correlation statistics and joinpoint methodology analyses were used to determine significance.
Disease patterns and online activity across all 4 diseases were significantly correlated: HIV infection (
The use of infoveillance shows promise as an alternative and inexpensive solution to disease surveillance and health campaign evaluation. Further research is needed to understand Google Trends as a valid and reliable tool for health research.
With an increasing number of people using the World Wide Web, their activities generate “big data” and provide meaningful research in infodemiology, which is the study of patterns and determinants of information on the Web or in a population with the purpose to inform public health and public policy [
Infoveillance has proven to be successful in predicting infectious disease outbreaks, spawning the development of Google Flu Trends [
With the growing digital era, many people now turn to the Web to learn about disease symptoms, diagnosis, and treatments, such as various cancers [
Second, infoveillance has also been used to monitor and track the success of marketing campaigns measured by the generation of interest and activity of a population observed on the Internet [
Thus, the aim of this observational study was to explore the applications of infoveillance in information seeking behavior for human immunodeficiency virus (HIV) and acquired immunodeficiency syndrome (AIDS), stroke, colorectal cancer, and marijuana use. Using Google Trends, these search activities were assessed against census data and public health campaigns to examine their relationships. To the best of our knowledge, these diseases have not been studied, particularly in the Canadian context. We first hypothesized that search trends are associated with the disease patterns of a population. Second, we hypothesized that the launch of the health campaign stimulates Google search activity, shown on Google Trends. The outcomes of this research provide new insights for public health professionals and contribute to further understanding of infodemiology in health research.
In this retrospective study, search activity on Google was examined for colorectal cancer, HIV and AIDS, stroke, and marijuana use from 2004 to 2015 in Canada. Google Trends provided relative search volumes (RSVs) for particular search queries. First, to study the effectiveness of infoveillance in chronic disease monitoring, the search activities were compared against disease prevalence on an annual basis between 2004 and 2015. Second, to investigate the application of infoveillance in health campaign evaluation, the levels of search activity before, during, and after the campaign were analyzed.
Health campaigns were sought through peer-reviewed sources and gray literature. The search was narrowed to identify campaigns in Canada and campaigns implemented after 2004 because Google Trends provides its data only after this time point. Health campaigns through any medium, such as televised advertisements, program delivery, or pamphlet distribution, were considered in the assessment. However, the minimum data elements required for this study were the campaign duration, frequency, and location. Health campaigns that met these criteria were then screened based on their disease focus. Preference was given to diseases that had not yet been reviewed in current literature, as well as chronic diseases in order to examine the infodemiology applications for chronic disease monitoring. Thus, health campaigns were chosen based on their transparency and availability of public information, which subsequently dictated the 4 diseases studied in this paper.
As a result, this includes the “ColonCancerCheck” campaign led by the Government of Ontario, “End HIV Stigma” campaign led by the Positive Living Society of British Columbia (formerly known as the British Columbia Persons With AIDS Society), “Anti-Marijuana” campaign led by the Government of Canada, and “Make Health Last” campaign led by the Heart and Stroke Foundation. Because these campaigns differ in purpose, duration, and delivery channels, this may help identify components of a successful campaign that lead to increased information seeking behaviors.
In 2012, Google Search accounted for 78% of the global market share among all search engines [
Summary of health campaigns.
Campaign features | Disease interests | |||
Colon cancer | HIVa and AIDSb | Marijuana use | Stroke | |
Organization | Government of Ontario | Positive Living Society of British Columbia | Government of Canada | Heart and Stroke Foundation |
Campaign name | “ColonCancerCheck” | “End HIV Stigma” | “Anti-Marijuana” | “Make Health Last” |
Purpose | Increase colon cancer screening practices [ |
Reduce stigma surrounding HIV [ |
Educate about the negative health consequences of marijuana among adolescents [ |
Increase awareness of stroke [ |
Delivery channels | Health care provider referrals as well as television advertisements, radio announcements, newspaper advertisements, and pamphlets across Ontario [ |
30-second public service announcements shown on 40 participating radio and television stations in British Columbia [ |
Television, Web-based advertisements, and social media [ |
Canadian Broadcasting Corporation (CBC) platforms including CBC television, CBC networks, CBC Player, regional stations, and digital banner [ |
Campaign period | April 2008 to September 2008 [ |
July 2006 to July 2007 [ |
October 2014 to December 2014 [ |
February 2013 to May 2013 [ |
Duration | 6 months | 12 months | 3 months | 4 months |
aHIV: human immunodeficiency virus.
bAIDS: acquired immunodeficiency syndrome.
List of search terms.
Search query filters | Colorectal cancer | HIVa/AIDSb | Marijuana use | Stroke |
Search terms | Colorectal cancer + colorectal diagnosis + colorectal screening + colorectal cancer screening + colon cancer + colon cancer symptoms | Hiv + aids + human immunodeficiency virus + acquired immunodeficiency virus + hiv symptoms + hiv diagnosis + aids symptoms + aids diagnosis + hiv contraction | Marijuana use + drug abuse + marijuana side effects + marijuana effects + effects of marijuana + drug use+ drug addiction | Stroke + stroke symptoms + stroke onset |
Geographic locations studied | Canada and Ontario | Canada and British Columbia | Canada and Ontario | Canada and Ontario |
Period of data collection | February 2004 to December 2015 | February 2004 to December 2015 | February 2004 to December 2015 | February 2004 to December 2015 |
aHIV: human immunodeficiency virus.
bAIDS: acquired immunodeficiency syndrome.
Statistics on the diseases were searched in both open peer-reviewed journals and gray literature sources. Data reported on the disease prevalence were preferred over disease incidence; however, if prevalence data were not available, then disease incidence data were still used. The disease incidence and prevalence data were obtained from Statistics Canada and other governmental publications for the period 2004 to 2014.
Data trends for each disease were graphed together to compare disease search activity and disease prevalence or incidence. Disease monitoring was first assessed via visual inspection to identify patterns in the data. A subsequent Pearson correlation analysis was conducted in IBM SPSS Statistics 23(IBM Corporation) to detect statistical significance. The mean annual search activity was plotted against the annual disease rates to determine any correlations. This procedure was repeated for all 4 diseases.
To evaluate the public health campaign effect on search activity, the joinpoint methodology was used [
The Pearson correlations tested the relationship between RSV of the disease and the prevalence or incidence of the disease in a given time period for the 4 diseases studied in this paper. In general, the relationship between these 2 variables was significant for each of the 4 diseases. From 2004 to 2010, there was a negative correlation between colorectal cancer incidence and search activity (
Web-based search activity and incidence trends for colorectal cancer. RSV: relative search volume.
Web-based search activity and prevalence trends for human immunodeficiency virus. RSV: relative search volume.
Web-based search activity and incidence trends for marijuana use. RSV: relative search volume.
Web-based search activity and incidence trends for stroke. RSV: relative search volume.
The RSVs of the 20 weeks before the campaign, during the campaign period, and 20 weeks after the campaign were graphed to visually inspect them for increases and decreases in search activity caused by the implementation of the campaign (examples shown in
Weekly Web-based search activity for colorectal cancer before, during, and after the campaign period. Highlighted section depicts the campaign duration.
Weekly Web-based search activity for human immunodeficiency virus before, during, and after the campaign period. Highlighted section depicts the campaign duration.
Weekly Web-based search activity for marijuana use before, during, and after the campaign period. Highlighted section depicts the campaign duration.
Weekly Web-based search activity for stroke before, during, and after the campaign period. Highlighted section depicts the campaign duration.
Joinpoint analysis for the periods 20 weeks before, during, and 20 weeks after the campaign.
Statistical outputs | Colorectal cancer | HIVa | Marijuana use | Stroke | ||
1-17 | 1-99 | 1-16 | 1-14 | |||
Slope, RSVb/week (95% CI) | −1.21 (−2.7 to 0.3) | −0.46 (−0.6 to −0.3) | −2.72 (−4.1 to −1.3) | 0.10 (−1.0 to 1.2) | ||
.11 | <.001 | <.001 | .85 | |||
17-21 | — | 16-28 | 14-17 | |||
Slope, RSV/week (95% CI) | 17.46 (−2.8 to 41.9) | — | 4.93 (2.7 to 7.3) | 23.55 (−1.0 to 54.3) | ||
.09 | — | <.001 | .06 | |||
21-34 | — | 28-33 | 17-20 | |||
Slope, RSV/week (95% CI) | −4.01 (−6.2 to −1.8) | — | −6.52 (−14.4 to 2.1) | −11.89 (−29.4 to 10.0) | ||
<.001 | — | .13 | .26 | |||
34-68 | — | 33-55 | 20-54 | |||
Slope, RSV/week (95% CI) | −0.19 (−0.7 to 0.3) | — | 1.19 (0.4 to 2.0) | −1.03 (−1.3 to 0.8) | ||
.44 | — | .006 | <.001 |
aHIV: human immunodeficiency virus.
bRSV: relative search volume.
cStatistical significance was defined as
Joinpoint analysis of the 4 diseases studied. The highlighted area shows the period the campaign was in effect. HIV: human immunodeficiency virus.
Both visual inspection and joinpoint analysis showed a positive correlation between health campaigns and people’s search behaviors for the “ColonCancerCheck” and “Anti-Marijuana” campaigns. Although these associations were moderate, the results support previous studies, suggesting that infoveillance can measure the success of a campaign in driving information seeking behaviors in a population [
Another difference that may have implications on the findings is the length of the campaign period. Other studies examined more impulsive interventions and their immediate influence on online activity. Noar et al [
Despite past successes of linking Web-based search activity and infectious disease outbreaks, little is known about the relationship between Web-based search activity and chronic diseases. In this study, the disease rates of colorectal cancer, HIV infection, marijuana use, and stroke showed significant correlation with Google search activity. Although disease trends have not been studied yet in literature, these findings are consistent with other related studies. In one study, the state-specific variance in stroke prevalence was shown to be related to the search query data of the specific state [
Limitations of the study include the limited availability of information about the campaigns as well as disease statistics. Publicly available data on the prevalence of HIV, stroke, marijuana use, and colorectal cancer were limited. Ideally, disease prevalence data for the entire 2004 to 2015 period would provide the best Pearson correlation estimate. In this study, prevalence data were only available for HIV. Analyzing prevalence data would capture both individuals with newly diagnosed disease and individuals who are still currently living with the disease. Both groups should be considered because they are both potential users of the Internet who may seek more information about their disease. Compared with incidence data, this statistic only observes new cases of the disease over a period of time. Therefore, it neglects the second group of individuals who may still turn to the Web to seek health information about the disease. Consequently, individuals included in the incidence data would not be completely representative of those who are likely to use the Internet to learn about their disease.
A second limitation of this study is the primary use of Google Trends to collect and assess search activity. Although Google is currently the only search engine to offer a data analytics tool that is accessible to the public, there are biases present in using Google Trends. Because Google makes up 78% of the global market share [
Finally, the general limitation in studying Internet search behaviors is the uncontrollable factors that can also affect search activity. Possible confounding variables include news events such as a disease-related death of a public figure, social media influence of a public figure, and other health campaigns held during the same time period. However, this is a classic limitation of conducting an observational study in which it would be impossible to discern a cause-effect relationship.
Further research is necessary to study associations between health campaigns and search activity as well as associations between disease rates and search activity on the Web. First, although the effect of health campaigns on Internet search activity was not established in this study, a recommendation for future studies is to examine the effects of recurrent health campaigns. Because positive findings have been reported for breast cancer awareness initiatives in the United States by Glynn et al [
Furthermore, campaigns that focus more on increasing awareness may be more appropriate to study because information-seeking behavior is the target outcome of such campaigns. For example, campaigns using advertisements to deliver their message compared with those handing out screening kits would be more relevant for engaging people on the Internet. Thus, selecting the right campaign will be an important factor to consider when studying search behaviors on Google Trends.
Application of Google Trends to provide indication of disease rates was shown to be promising in this study. However, future studies are still warranted in order to strengthen this correlation. First, disease prevalence rates should be examined instead of disease incidence rates because prevalent numbers would capture all individuals who are most likely to seek information regarding the disease on the Internet. In particular, diseases that have shown upward and downward trends in prevalence rates over a period of time will be best to study. With fluctuations in both positive and negative directions, it would test whether the search activity would also follow the fluctuations in both positive and negative directions. If both the disease rates and the Internet search activity agree within the same time frame, this correlation would strengthen the use of infodemiology in monitoring diseases in a population.
In this study, analysis of Internet search data showed significant relationships between health campaigns and information seeking behaviors in Canada for colorectal cancer and substance use but not for HIV and AIDS and stroke. The outcomes of the “ColonCancerCheck” and “Anti-Marijuana” campaigns were consistent with previous studies. Possible reasons for the discrepant findings from the “Make Health Last” and “End HIV Stigma” campaigns include differences in campaign type, frequency, and duration. However, the use of Web-based search data on digital disease monitoring remains promising. The study found significant associations between search activity and disease prevalence or incidence rates. Further studies are needed to validate the reliability of using Google Trends for health research purposes.
List of search terms reviewed for inclusion in the study.
Results from joinpoint analysis for colorectal cancer.
Results from joinpoint analysis for stroke.
Results from joinpoint analysis for substance abuse.
Results from joinpoint analysis for human immunodeficiency virus and AIDS.
acquired immunodeficiency syndrome
human immunodeficiency virus
relative search volume
The authors thank the undergraduate thesis program at the School of Public Health and Health Systems at the University of Waterloo for the opportunity to make this work possible.
None declared.