Published on 06.04.17 in Vol 3, No 2 (2017): Apr-Jun
Using Google Trends Data to Study Public Interest in Breast Cancer Screening in Brazil: Why Not a Pink February?
Background: One of the major challenges of the Brazilian Ministry of Health is to foster interest in breast cancer screening (BCS), especially among women at high risk. Strategies have been developed to promote the early identification of breast cancer mainly by Pink October campaigns. The massive number of queries conducted through Google creates traffic data that can be analyzed to show unrevealed interest cycles and their seasonalities.
Objectives: Using Google Trends, we studied cycles of public interest in queries toward mammography and breast cancer along the last 5 years. We hypothesize that these data may be correlated with collective interest cycles leveraged by national BCS campaigns such as Pink October.
Methods: Google Trends was employed to normalize traffic data on a scale from 0 (<1% of the peak volume) to 100 (peak of traffic) presented as weekly relative search volume (RSV) concerning mammography and breast cancer as search terms. A time series covered the last 261 weeks (November 2011 to October 2016), and RSV of both terms were compared with their respective annual means. Polynomial trendlines (second order) were employed to estimate overall trends.
Results: We found an upward trend for both terms over the 5 years, with almost parallel trendlines. Remarkable peaks were found along Pink October months— mammography and breast cancer searches were leveraged up reaching, respectively, 119.1% (2016) and 196.8% (2015) above annual means. Short downward RSVs along December-January months were also noteworthy along all the studied period. These trends traced an N-shaped pattern with higher peaks in Pink October months and sharp falls along subsequent December and January.
Conclusions: Considering these findings, it would be reasonable to bring Pink October to the beginning of each year, thereby extending the beneficial effect of the campaigns. It would be more appropriate to start screening campaigns at the beginning of the year, when new resolutions are taken and new projects are added to everyday routines. Our work raises attention to the study of traffic data to encourage health campaign analysts to undertake better analysis based on marketing practices.
JMIR Public Health Surveill 2017;3(2):e17
The increasing number of searches for health-related issues generates “big data,” providing meaningful research in infodemiology, which is the study of patterns and determinants of information on the Web with the purpose to inform public health and public policy [, ]. This new concept is grounded in two approaches: supply-based infodemiology (studies the dynamics, quantity, and quality of information available on websites, media reports, blogs, tweets, etc) and demand-based (health information seeking, eg, what people are searching for on the Internet) [ ]. Information-seeking behavior (measured, eg, by the frequency with which the public enters specified search terms) has been used successfully to show unexpected interest cycles and regional seasonalities. Numerous conditions and situations have been studied, from influenza epidemics [ ] to seasonality of headache [ ]. Previous studies suggest that infoveillance can measure the success of a campaign in driving information-seeking behaviors [ ], showing significant relationships between health campaigns and information-seeking behaviors [ ]. Web data surveillance holds a strong potential to lead to overlooked phenomena [ , , ] and might increase knowledge on campaign strategies centered on timing of interest cycles. This paper works under the demand-based approach, examining patterns of public seeking for information on breast cancer early identification, like similar works that have become increasingly frequent in recent literature [ , - ].
Breast cancer is the most common form of cancer among women all over the world, whether in high-income or in poor countries , accounting for 22% of the 4.7 million new cases occurring annually among females worldwide [ , ]. There is plenty of evidence that early diagnosis initiatives of breast cancer save far more lives and are much more cost-effective than treatment of late stages [ , ]. From the perception of countries like Brazil, the efficacy and adherence to breast cancer screening (BCS) is still a problematic issue from a public health policy perspective [ ]. Brazilian mortality rates are increasing with striking variations between geographic regions, and several factors may account for the disparities, including delays in diagnosis due to low education level [ ], low adherence to screening programs, and gaps in their implementation [ , ]. Surveillance systems databases have been used to assess self-reported cancer screening utilization. Although invaluable in identifying determinants of screening use and describing trends, these database systems are too complex and costly, and remain a challenge for the largest country in Latin America. On the other hand, a massive number of queries conducted through Google create data that can be analyzed with Google Trends, a publicly available tool used to compare the volume of Web search queries in different periods [ ].
The use of search volume for predicting real-world events may have less to do with their superiority over other data systems than with matters of low cost, transparency, simplicity, and reproducibility across a variety of domains. Among other free access tools available , Google Trends provides essential data to public health planners as weekly reports on the volume of queries related to pertinent issues. Google Trends shows oscillations whenever a particular search term is searched for, relative to the top number of searches [ ]. We hypothesize that these query data may be correlated with collective interest cycles affected by campaigns and, thus, may be suitable in “predicting the present” in terms of “BCS attitudes.” If that is the case, Google Trends would be a low-cost support for screening campaigns planners, providing feedback information almost immediately after interventions. In this paper, we studied oscillations of public interest in queries toward mammography and breast cancer along the last 5 years.
Google Trends is a Web-based free tracking system of Google search volumes. Google Trends algorithmics normalize data for the overall number of searches on a scale from 0 (search volume <1% of the peak volume) to 100 (peak of popularity), presenting them as a weekly relative search volume (RSV). RSV values are by definition, as presented in the y-axis (), always less than 100, and display a proportion compared with the highest search volume. This approach corrects results for population size and Internet access, both of which increased during the study period.
Mammography and breast cancer (“mamografia” and “cancer de mama” in Portuguese) were used as search terms to produce separate time series (put together in) in the last 260 weeks (November 2011 to October 2016), with the filter “Brazil” (country) in category “Health.” We selected these search terms based on their face validity, excluding their plural forms or any other unusual forms, which resulted in low weekly RSV.
The results were analyzed considering the data export through comma separated value (CSV) files. The weekly and monthly RSV values were compared with annual means, and a graph was plotted adding up annual means to highlight differences between weekly RSV series for both terms. Polynomial trendlines (second order) were added to the weekly RSV to estimate trends over the 261 weeks.
Results show an upward trend for both breast cancer and mammography searches over the 5 years, with almost parallel trend curves (). The annual means on breast cancer queries show a slight decline between 2011 and 2012 (not so relevant, considering just the last two months, without Pink October) followed by a raise from there, with a “jump” in 2015 annual means, as shown in . Annual means of mammography searches rose steadily along the 5 years ( ). Interest in breast cancer seems to significantly increase in Pink October months with remarkable higher means (reaching up to 196.8% above the 2015 annual means). There were several minor peaks throughout the years (without impacts comparable with Pink October months and no obvious seasonality). Likewise, there were remarkable growing peaks for mammography searches along Pink October months (reaching 119.1% above the 2015 annual means), though not so “instable” throughout the years when compared with breast cancer. A short downward trend along December-January months was also noteworthy along the 261 weeks—mammography reached 27.1% and breast cancer 36.6% below the annual means. These oscillations traced an N-shaped curve with higher peaks in Pink October months and sharp falls along the two subsequent months ( ).
|Relative search volume ||2011||2012||2013||2014||2015||2016a|
|Annual means |
|Pink October |
|December-January means |
|Annual means |
|Pink October |
|December-January means |
aFrom January to October.
In this study, we examined the utility of Google Trends in identifying cycles of public interest in breast cancer and BCS. Although Internet access is still concentrated in metropolitan areas in Brazil, limiting Google Trends’ use in areas with a low search volume, several studies seem to support the assumption that queries are sensible to foresee collective movements in real life. This is a well-known truism in marketing sciences grounded on studies on Google Trends’ power in “predicting the present” , meaning that search volume correlates with contemporaneous events. The same rationale is being employed in several health research fields, and it has been useful to elucidate a wide range of questions from vaccination compliance [ ], to protection against ultraviolet exposure during summer season [ ], and interest in cancer issues after prevention campaigns [ ].
There are two points to be highlighted in the results: (1) the growing interest in the early identification of a major public health problem and (2) the short collapse of this interest cycle at the end of each year (). In short, our results showed N-shaped RSV curves both in mammography and breast cancer ( ), with higher peaks along Pink October months and sharp declines along December and January. This “cancer screening vortex” has been also described by Schootman and colleagues—the highest RSV along breast, colorectal, cervical, prostate, and lung cancer screening campaigns and the lowest during December-January [ ]. In this case, this gap may be due to the Brazilian cultural aspects concerning summer vacations, Christmas, and New Year’s celebrations. People tend to disregard issues related to illness and death, typically postponing some health decisions for the next year.
In Brazil, Pink October’s strategy has been planned to promote collective interest in BCS in the context of cultural taboos and misconceptions. In fact, interest in breast cancer seems to significantly increase in October, although with several peaks throughout the year, with no evident seasonality. Likewise, in recent years, RSV concerning the early diagnosis of breast cancer has been significantly higher along Pink October months. It seems to be growing almost exponentially, and perhaps will go beyond searches on breast cancer in the next few years. It is consistent, with several works describing the use of Internet (boosted by higher educational level and the worldwide widespread of mobile phones) as a resource to self-care [, ]. There is also a close correlation between the level of education—which has grown in Brazil in the last decades [ ]—and accesses by Google to issues concerning science and health [ ]. Interest in breast cancer always outperformed (in absolute and relative terms) mammography, but showed erratic patterns over the years and irregular growth in annual means. This may be consequent to events without seasonality, linked to the high incidence of new cases and constant media coverage—especially among celebrities who seem to boost the number of hits [ , ].
Surveillance systems databases have been very useful to assess self-reported cancer screening utilization. These data have been invaluable in identifying determinants of screening practices and describing trends and regional inequalities over time [, ]. Unfortunately, due to the need of massive survey interviews for data collection, these database systems are too costly for low-income countries [ ]. The complexity of a suitable survey structure required to aggregate reliable data, requiring the participation of a large study population, is also a huge obstacle for the largest country in Latin America. Methodological problems are also involved—public health planners must consider accuracy problems caused by self-report questionnaires and selection bias [ ]. As a result, the Brazilian population–based prevalence of cancer screening methods are not precise and the cultural impact of Pink October campaigns concerning BCS behavior is still unknown. Schootman and colleagues [ ] examined the utility of Google Trends relative to a surveillance system focused on cancer screening (behavioral risk factor surveillance system). Social interest in learning about cancer screening exams was compared with surveillance systems based on self-reported use of these tests. In the same manner, the present results are eloquent to point out that attention has been increasingly drawn to the means of early identification of cancer. It is not clear if these findings may be taken as a plain evidence of well-succeeded campaigns supported by huge Brazilian government investments in access to screening [ ] leveraged by the raise in educational levels [ ] and widespread use of Google in mobile phones, tablets, notebooks, and desktops [ ]. It is not possible to be sure if women moved forward from curiosity in Google queries to effective action. Nonetheless, the number of mammograms performed in the Brazilian Public Health System has jumped to just over 2.5 million (61.9% growth) in the period studied [ ].
Timing-Based Strategies and the “Cancer Vortex”
There are several reports in the literature concerning campaigns and health interventions based on “what” and “how” (selection of qualified information and proper vehicles to deliver messages) [, ], “where” (environments in which campaigns would be more effective) [ , ], and “who” (who are the best counselors) [ , , ]. Nonetheless, reports based on “when”—the ideal timing for intervention—are not so frequent. Given the described findings and considering that the effectiveness of campaigns may be influenced by their impact and persistency in everyday life (measured in terms of “intensity,” “duration,” and convergence with relevant facts), it would be reasonable to consider some changes in Pink October timing. It would be reasonable to assume that, in Brazil, the anticipation of “Pink October” to the beginning of each year could extend the beneficial effect of the campaigns. Considering that both RSV curves decline sharply in December and January of each year (consistently with other authors in other continents [ ]), would it be reasonable to expand the beneficial effects of Pink October by adding some months between its interest peak and the “December-January cancer vortex”? If we go further in this perspective and change to “Pink February,” would more people be interested in BCS for a longer period of time? Following this reasoning, it would be more appropriate to have screening campaigns at the beginning of the year, when new resolutions are taken and new projects are added to everyday routines.
In Brazil, Web access is still concentrated in (but not limited to) metropolitan areas, which would limit the use of Google Trends in rural areas or regions with a low search volume. In fact, specific subpopulations and their cultural disparities may not be reachable by RSV algorithmics. In addition, Google Trends data only represent searches performed in Google. It is also important to consider that, although it represents a simple and low-cost alternative to nationwide screening database, Google Trends is still insufficient to describe screening behavior peculiarities at a global level. Nonetheless, as mentioned before, several works have described information-seeking behavior as a proxy of self-care attitudes. The potential of Google Trends to generate hypotheses about public awareness and interest in multiple aspects of cancer is also well documented [, , ].
Future studies based on algorithmics sensible to interest cycles among small community groups should be useful to plan interventions tailored to the local needs. Study designs and analytic tools more appropriate to estimate the effects of media coverage on screening behavior would also be of invaluable help.
The leading goal of this study is to raise attention to forecasting methods using massive data to encourage health policy makers to undertake more sophisticated analyses based on classic marketing practices. Timely evaluations after campaigns may inform policy makers about awareness and interest seasonal cycles, which would leverage further interventions. Transparency of methods, simplicity, and reproducibility make the use of these new approaches an important alternative for low-income and huge-dimension countries. Timing-based strategies and Google Trends evaluations after campaigns may inform policy makers about seasonal cycles of attention and interest, which would leverage further interventions. We believe that patterns described here can be useful as baselines to help campaign analysts get started with specialized techniques that can be subsequently employed in more effective campaigns. The understanding and proper use of Google Trends oscillations, even being common sense for marketing researchers, are challenging for disciplines like public health, where government agencies work with a different concept of timing and public health demands. However, RSV trends should be clear for public communication planners with broad perspectives and committed in a timely fashion with users’ demands.
Conflicts of Interest
- Eysenbach G. Infodemiology and infoveillance: framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the Internet. J Med Internet Res 2009 Mar 27;11(1):e11 [FREE Full text] [CrossRef] [Medline]
- Eysenbach G. Infodemiology and infoveillance tracking online health information and cyberbehavior for public health. Am J Prev Med 2011 May;40(5):S154-S158. [CrossRef] [Medline]
- Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L. Detecting influenza epidemics using search engine query data. Nature 2009 Feb 19;457(7232):1012-1014. [CrossRef] [Medline]
- Cjuno J, Taype-Rondan A. [Headache seasonality in the Northern and Southern hemispheres: an approach using Google Trends]. Rev Med Chil 2016 Jul;144(7):947 [FREE Full text] [CrossRef] [Medline]
- Bernardo T, Rajic A, Young I, Robiadek K, Pham M, Funk J. Scoping review on search queries and social media for disease surveillance: a chronology of innovation. J Med Internet Res 2013;15(7):e147 [FREE Full text] [CrossRef] [Medline]
- Ling R, Lee J. Disease monitoring and health campaign evaluation using Google search activities for HIV and AIDS, stroke, colorectal cancer, and marijuana use in Canada: a retrospective observational study. JMIR Public Health Surveill 2016;2(2):e156 [FREE Full text] [CrossRef] [Medline]
- Goel S, Hofman JM, Lahaie S, Pennock DM, Watts DJ. Predicting consumer behavior with Web search. Proc Natl Acad Sci USA 2010 Oct 12;107(41):17486-17490. [CrossRef] [Medline]
- Anker AE, Reinhart AM, Feeley TH. Health information seeking: a review of measures and methods. Patient Educ Couns 2011;82(3):346-354. [CrossRef] [Medline]
- Glynn RW, Kelly JC, Coffey N, Sweeney KJ, Kerin MJ. The effect of breast cancer awareness month on internet search activity--a comparison with awareness campaigns for lung and prostate cancer. BMC Cancer 2011 Oct 12;11:442 [FREE Full text] [CrossRef] [Medline]
- Schootman M, Toor A, Cavazos-Rehg P, Jeffe DB, McQueen A, Eberth J, et al. The utility of Google Trends data to examine interest in cancer screening. BMJ Open 2015 Jun 08;5(6):e006678 [FREE Full text] [CrossRef] [Medline]
- Benson JR, Jatoi I. The global breast cancer burden. Future Oncol 2012 Jun;8(6):697-702. [CrossRef] [Medline]
- Parkin DM, Bray F, Ferlay J, Pisani P. Global cancer statistics, 2002. CA Cancer J Clin 2005;55(2):74-108 [FREE Full text] [Medline]
- Parkin DM, Fernández LM. Use of statistics to assess the global burden of breast cancer. Breast J 2006;12(Suppl 1):S70-S80. [CrossRef] [Medline]
- Tabár L, Vitak B, Chen TH, Yen AM, Cohen A, Tot T, et al. Swedish two-county trial: impact of mammographic screening on breast cancer mortality during 3 decades. Radiology 2011 Sep;260(3):658-663. [CrossRef] [Medline]
- Walters S, Maringe C, Butler J, Rachet B, Barrett-Lee P, Bergh J, ICBP Module 1 Working Group. Breast cancer survival and stage at diagnosis in Australia, Canada, Denmark, Norway, Sweden and the UK, 2000-2007: a population-based study. Br J Cancer 2013 Mar 19;108(5):1195-1208 [FREE Full text] [CrossRef] [Medline]
- Amorim V, Barros M, César C, Carandina L, Goldbaum M. Factors associated with lack of mammograms and clinical breast examination by women: a population-based study in Campinas, São Paulo State, Brazil. Cad Saude Publica 2008;24(11):2623-2632. [Medline]
- Damiani G, Basso D, Acampora A, Bianchi C, Silvestrini G, Frisicale E, et al. The impact of level of education on adherence to breast and cervical cancer screeningvidence from a systematic review and meta-analysis. Prev Med 2015;81:28. [CrossRef] [Medline]
- Lee BL, Liedke PE, Barrios CH, Simon SD, Finkelstein DM, Goss PE. Breast cancer in Brazil: present status and future goals. Lancet Oncol 2012 Mar;13(3):e95-e102. [CrossRef] [Medline]
- Instituto Nacional de Câncer. INCA. 2016. Estimativa 2016 URL: http://www.inca.gov.br/estimativa/2016/sintese-de-resultados-comentarios.asp> [accessed 2017-03-20] [WebCite Cache]
- Nuti S, Wayda B, Ranasinghe I, Wang S, Dreyer R, Chen S, et al. The use of Google Trends in health care research: a systematic review. PLoS One 2014;9(10):e109583. [CrossRef] [Medline]
- Choi H, Varian H. Google. 2009. Predicting the present with Google Trends: Technical report URL: http://google.com/googleblogs/pdfs/google_predicting_the_present.pdf> [accessed 2017-03-20] [WebCite Cache]
- Barak-Corren Y, Reis BY. Internet activity as a proxy for vaccination compliance. Vaccine 2015 May 15;33(21):2395-2398 [FREE Full text] [CrossRef] [Medline]
- Vasconcellos-Silva PR, Castiel LD, Griep RH. Patterns of access to information on protection against UV during the Brazilian summer: is there such a thing as the “summer effect”? Cien Saude Colet 2015;20(8):2533-2538. [CrossRef]
- Vasconcellos-Silva PR, Castiel LD, Griep RH, Zanchetta M. Cancer prevention campaigns and Internet access: promoting health or disease? J Epidemiol Community Health 2008 Oct;62(10):876-881. [CrossRef] [Medline]
- Jiang S, Street R. Pathway linking Internet health information seeking to better health: a moderated mediation study. Health Commun 2016:1-8 Epub ahead of print. [CrossRef] [Medline]
- Cao W, Zhang X, Xu K, Wang Y. Modeling online health information-seeking behavior in China: the roles of source characteristics, reward assessment, and Internet self-efficacy. Health Commun 2016 Sep;31(9):1105-1114. [CrossRef] [Medline]
- Portal B. Brasil. 2014 Nov 05. Women's education increases relative to men URL: http://www.brasil.gov.br/cidadania-e-justica/2014/11/escolaridade-das-mulheres-aumenta-em-relacao-a-dos-homens> [accessed 2017-03-20] [WebCite Cache]
- Segev E, Baram-Tsabari A. Seeking science information online: data mining Google to better understand the roles of the media and the education system. Public Underst Sci 2012 Oct;21(7):813-829. [CrossRef] [Medline]
- Cooper CP, Gelb CA, Lobb K. Celebrity appeal: reaching women to promote colorectal cancer screening. J Womens Health (Larchmt) 2015 Mar;24(3):169-173 [FREE Full text] [CrossRef] [Medline]
- Vasconcellos-Silva PR, Sormunen T, Craftman AG. Evolution of accesses to information on breast cancer and screening on the Brazilian national cancer institute website: an exploratory study. Cien Saude Colet 2017;21(8):25-28 (forthcoming).
- Joseph DA, King JB, Miller JW, Richardson LC, Centers for Disease Control and Prevention (CDC). Prevalence of colorectal cancer screening among adults--Behavioral Risk Factor Surveillance System, United States, 2010. MMWR Suppl 2012 Jun 15;61(2):51-56. [Medline]
- Hiatt RA, Klabunde C, Breen N, Swan J, Ballard-Barbash R. Cancer screening practices from National Health Interview Surveys: past, present, and future. J Natl Cancer Inst 2002 Dec 18;94(24):1837-1846. [Medline]
- Dey S. Preventing breast cancer in LMICs via screening and/or early detection: the real and the surreal. World J Clin Oncol 2014 Aug 10;5(3):509-519 [FREE Full text] [CrossRef] [Medline]
- Portal B. Brasil. 2017. Number of mammograms performed by SUS in four years grows by 62% URL: http://www.brasil.gov.br/saude/2015/10/mamografias-realizadas-no-sus-cresceram-62-em-quatro-anos [accessed 2017-03-20] [WebCite Cache]
- Airhihenbuwa CO, Obregon R. A critical assessment of theories/models used in health communication for HIV/AIDS. J Health Commun 2000;5 Suppl:5-15. [CrossRef] [Medline]
- Cook PF, Carrington JM, Schmiege SJ, Starr W, Reeder B. A counselor in your pocket: feasibility of mobile health tailored messages to support HIV medication adherence. Patient Prefer Adherence 2015;9:1353-1366 [FREE Full text] [CrossRef] [Medline]
- Paskett ED, Tatum CM, D'Agostino Jr R, Rushing J, Velez R, Michielutte R, et al. Community-based interventions to improve breast and cervical cancer screening: results of the Forsyth County Cancer Screening (FoCaS) Project. Cancer Epidemiol Biomarkers Prev 1999 May;8(5):453-459 [FREE Full text] [Medline]
- Paskett ED, McMahon K, Tatum C, Velez R, Shelton B, Case LD, et al. Clinic-based interventions to promote breast and cervical cancer screening. Prev Med 1998;27(1):120-128. [CrossRef] [Medline]
- Noar SM, Willoughby JF, Myrick JG, Brown J. Public figure announcements about cancer and opportunities for cancer communication: a review and research agenda. Health Commun 2014;29(5):445-461. [Medline]
- Goodenberger ML, Thomas BC, Wain KE. The utilization of counseling skills by the laboratory genetic counselor. J Genet Couns 2015 Feb;24(1):6-17. [CrossRef] [Medline]
|BCS: breast cancer screening|
|CSV: comma separated values|
|RSV: relative search volume|
Edited by G Eysenbach; submitted 20.11.16; peer-reviewed by R Hawkins, E Davies; comments to author 04.01.17; revised version received 18.01.17; accepted 25.01.17; published 06.04.17
©Paulo Roberto Vasconcellos-Silva, Dárlinton Barbosa Feres Carvalho, Valéria Trajano, Lucia Rodriguez de La Rocque, Anunciata Cristina Marins Braz Sawada, Leidjaira Lopes Juvanhol. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 06.04.2017.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.