Google Searches and Suicide Rates in Spain, 2004-2013: Correlation Study

Background: Different studies have suggested that web search data are useful in forecasting several phenomena from the field of economics to epidemiology or health issues. Objective: This study aimed to (1) evaluate the correlation between suicide rates released by the Spanish National Statistics Institute (INE) and internet search trends in Spain reported by Google Trends (GT) for 57 suicide-related terms representing major known risks of suicide and an analysis of these results using a linear regression model and (2) study the differential association between male and female suicide rates published by the INE and internet searches of these 57 terms. Methods: The study period was from 2004 to 2013. In this study, suicide data were collected from (1) Spain’s INE and (2) local internet search data from GT, both from January 2004 to December 2013. We investigated and validated 57 suicide-related terms already tested in scientific studies before 2015 that would be the best predictors of new suicide cases. We then evaluated the nowcasting effects of a GT search through a cross-correlation analysis and by linear regression of the suicide incidence data with the GT data. Results: Suicide rates in Spain in the study period were positively associated (r<-0.2) for the general population with the search volume for 7 terms and negatively for 1 from the 57 terms used in previous studies. Suicide rates for men were found to be significantly different than those of women. The search term, “allergy,” demonstrated a lead effect for new suicide cases (r=0.513; P=.001). The next significant correlating terms for those 57 studied were “antidepressant,” “alcohol abstinence,” “relationship breakup” (r=0.295, P=.001; r=0.295, P=.001; and r=0.268, P=.002, respectively). Significantly different results were obtained for men and women. Search terms that correlate with suicide rates of women are consistent with previous studies, showing that the incidence of depression is higher in women than in men, and showing different gender searching patterns. Conclusions: A better understanding of internet search behavior of both men and women in relation to suicide and related topics may help design effective suicide prevention programs based on information provided by search robots and other big data sources. (JMIR Public Health Surveill 2020;6(2):e10919) doi: 10.2196/10919


Background
According to the World Health Organization (WHO) projections, by 2030, there will be 1,007,000 deaths by suicide, making suicide the 15th leading cause of death globally and accounting for 1.4% of all deaths [1]. Despite the common idea that suicide is more prevalent in high-income countries, about 75% of suicides worldwide occur in low-and middle-income countries. In general, suicide rates are lower among people aged <15 years and >70 years [2].
With a rate of 10 cases every day, suicide is the leading cause of unnatural death in Spain, producing more than twice as many deaths than traffic accidents, 7 times more deaths than workplace accidents, and 70 times more deaths than domestic violence. It is also the leading cause of death among men aged 20 to 24 years [3].
The incidence of suicide in a society depends on a range of factors, of which clinical depression is a particularly common cause [4]. Substance abuse, severe physical disease, and disability are also recognized causes of suicide. Countries in Eastern Europe and East Asia have the highest suicide rate in the world. The region with the lowest suicide rate is Latin America. Gender differences also play a significant role: Among all age groups in most parts of the world, females tend to show higher rates of reported nonfatal suicidal behavior and males have a much higher rate of completed suicides.

Availability of Google and the Internet
As Howe [5] reports, the internet was the result of some visionary thinking by people in the early 1960s who saw great potential in allowing computers to share information on research and development in scientific and military fields. There is the common idea that widespread media coverage of specific methods of suicide may induce copycat deaths and initiate changes in the popularity of certain methods since at-risk individuals may use the internet to research particular methods of suicide that can be more lethal than the commonly used methods [6]. It is unclear whether the information obtained on the internet is reducing the risk of suicide or contributing to suicide promotion; there is evidence to suggest that the internet may facilitate suicide in various ways [7], but the influence of the internet on the incidence of suicide is not well known. On the contrary, efforts to carry out epidemiological monitoring of suicide are hampered by gaps in data availability. At present, the lag time for reporting data is 3 years for the Centers of Disease Control and Prevention (CDC) in the United States, ≥5 years for the WHO [8], and about 3 years for the Instituto Nacional de Estadística (INE, National Statistics Institute) in Spain.

Using Google Search Totals to Predict Social Trends
Increasingly, the volume of internet searches is being used as a social indicator (eg, in the field of epidemiology), and recently, this method has been applied to studies on suicide. We can establish a chronology of studies that began to use internet search volumes following the study of Choi and Varian [9], who reviewed the pioneering studies that suggested that web search data are useful for forecasting in various fields. In economics, the first such study was performed by Ettredge et al [10], who examined the association between search volumes and unemployment rates in the United States. In the same year, Cooper et al [11] described the use of internet search volumes for cancer-related topics. Since then, there have been several papers that have examined web search data in numerous fields.
In the field of epidemiology, Eysenbach [12]-as the initiator-and Ginsberg et al [13] showed that search data could help predict the prevalence of influenza-like diseases by finding a positive relationship between the number of influenza-related search queries and pneumonia and influenza mortality. These papers were widely publicized and stimulated several further findings in epidemiology, including those by Brownstein et al [14], Hulth et al [15], Pelat et al [16], and Valdivia and Monge-Corella [17].
In the field of economics, Choi and Varian [9] showed how Google Search Insights data could be used to predict some economic metrics including initial claims for unemployment, vacation destinations, and automobile demand. Askitas and Zimmermann [18] and Suhoy [19] inspected unemployment data in the United States, Germany, and Israel. Guzman [20] examined Google data as a forecaster of inflation, pointing out that the Google Inflation Search Index (GISI) indicator is a good way of measuring inflation. Baker and Fradkin [21] have used Google search data to examine how job search activity was influenced by policies on unemployment payment extensions. Radinsky et al [22] and Preis et al [23] examined the use of search data for measuring consumer confidence, and Vosen and Schmidt [24] studied consumption and retail sales metrics.
Shimshoni et al [25] verified the predictability of Google Trends data, showing that substantial quantities of search terms are greatly predictable using simple seasonal statistical methods. Goel et al [26] offered a useful survey of work in this area, revealing some of the limitations of web search data. As they pointed out, obtaining search data is easy and often helpful in making predictions but it may not provide significant increases in predictability.
Recent studies have shown the usefulness of new methodologies known as Infoveillance, Infodemiology, or Digital Disease Surveillance. For example, Adler et al [27], through projections of known correlations, identified various states in India with poor surveillance of the incidence of suicide or states with limited or no access to the internet.

Forecasting Suicide
Work on suicide has predominantly focused on traditional forms of media, particularly surrounding the issue of suicide contagion.
Daine et al [28] conducted a systematic review investigating the influence of the internet on self-harm and suicide in young people. They provided evidence of both positive influences, such as web-based media being used as a form of support, and negative influences, such as internet addiction, cyberbullying, and the internet being a source of information on suicide and self-harm. Mok et al [7] expanded on previous work by focusing explicitly on suicide-related internet use. They define suicide-related internet use as the "use of the Internet for reasons relating to an individual's own feelings of suicide" [7]. This paper summarized and assessed the existing work on not only the influence of suicide-related internet use but also its nature by presenting the main findings and discussing the types of studies that have been conducted, their strengths and limitations, and recommendations for future research. These findings are reported in Textbox 1.
In this study, we have focused on the topic, "Suicide-related internet search trends can provide an indicator of suicide risk in a population" in Textbox 1. According to Mok et al [7], most of the 9 articles give credence to a link between suicide-related search activity and suicide rates.
Some papers studied the correlation between search terms such as "suicide" and "depression" [8] in searches and news reports [29], or between searches and unemployment rates [30]; therefore, we have excluded this kind of semantic or mass media correlation, focusing only on the correlation between search terms and actual death rates reported by official institutions (ie, the INE for the 2004-2013 period). We did not find Chen's 2013 paper reported by Mok et al [7] and Gunn and Lester [31]; as such, we interpreted this as a citation error in the Mok et al [7] paper. Therefore, we finally used 6 articles ( Table 1) that studied a total of 57 terms, of which 14 do not return results in Spanish in Google Trends for the period studied in Spain (Table 2). Textbox 1. Main findings of the literature on suicide-related internet use [7].

•
Use of the internet to search for suicide-related content: • Suicide-related internet search trends can provide an indicator of suicide risk in a population (number of articles, n=9).
• Users conducting suicide-related searches typically access scientific information and community resource websites (n=1).
• Use of the internet to express suicide-related feelings (n=7) • Suicide-related internet use and suicidal behavior: • The internet may facilitate suicide in various ways (n=17).
• Internet-related suicides are rare when compared with overall suicides (n=1).

•
There is no evidence of increased suicidal behavior in response to a suicide on a web-based forum (n=1).
• Suicide-related internet use and suicidal ideation: • Individuals who engage in suicide-related internet use report higher levels of suicidal ideation (n=4).
• There are mixed findings regarding the influence of suicide-related internet use on suicidal ideation over time (n=6).
• Role of the internet in suicide prevention: • Informal web-based suicide communities can function as support groups (n=1).
• Web-based suicide forums staffed by trained volunteers can have positive effects (n=3).

Objectives
The study has two objectives: (1) It evaluates the correlation between suicidal rates released by the INE and internet search trends in Spain reported by Google Trends for 57 suicide-related terms representing major known risks of suicide; these terms have already been tested in previous scientific studies systematized by Mok et al [7] (topic "Suicide-related internet search trends can provide an indicator of suicide risk in a population"). (2) It examines the differential association between male and female suicide rates published by the INE and internet searches related to the aforementioned 57 terms. The study included data from 2004 to 2013, as this was the maximum period for which relevant data were available from the INE and Google Trends.

Methods
In this section, we have addressed two issues: (1) how Google presents the results of search volume and how those results are normalized over time and in different geographical areas and (2) presentation of the variables we worked with-the expressions or terms used whose search volumes are reported by Google Trends and suicide rates (globally and segregated by gender) provided by the INE.

Google Trends
Google Trends provides a time-series index of the volume of queries users entered into Google in a given geographic area. Wikipedia explains it as follows [38]: Google Trends is a public web facility of Google Inc., based on Google Search, that shows how often a particular search-term is entered relative to the total search-volume across various regions of the world, and in various languages.
Although Google Trends does not show the absolute number of searches, it calculates a query share for a search term. This means that Google calculates the number of searches for a given term as a proportion of the total number of searches in each location at a given time. These calculations are then normalized to a Google Trends Relative Search Volume (RSV) index between 0 and 100, where an RSV index of 100 designates the date when there was the highest amount of search activity for that given term. Thus, a search index of 40 equates to 40% of the most intense search activity in the selected country at a given period.
Thus, the RSV index is a way to normalize (from 0 to 100) the query share that is the total volume of queries of the search term in question within a particular geographic region divided by the total number of searches in that region for the period under review. The maximum percentage of consultation in the specified time period is normalized to 100, and the other measures for that period of time are calculated relative to this value.
Google Trends also allows for the comparison of the relative volumes of blocks of searches for up to 5 terms or phrases. In this case, the RSV of other terms that did not reach the peak of 100 is normalized to the 100 value of the term with the highest search volume of the 5 terms of phrases in the block. However, in our work, terms were consulted one by one.
It is interesting to point out that although, according to Google Scholar, more than 10,000 scientific papers used or mentioned Google Trends service, we did not find any mathematical formulation of how the RSV value was calculated or operationalized by Google Trends. Therefore, we proposed a tentative mathematical formulation of how this value is calculated (Figure 1).
In short, Google Trends calculates the number of searches as percentages (formula 2 of Figure 1) based on the total searches in a month (formula 1 of Figure 1), normalizes the series allocated to the highest value (ie, the value of 100), and scales all other values accordingly (formula 3 of Figure 1).

Search Term Variables Group
As variables, 57 query terms ( Table 2) have been used that relate to suicidal ideation studied in the 6 articles mentioned in Table  1. These terms were translated into Spanish with the help of the website WordReference [39]; note that for cases in which the original language is different from the language of the articles (ie, English), this meant a third translation, as some of the papers were in Japanese and traditional Mandarin Chinese, which can be a significant semantic shift.
Queries to Google Trends are not case sensitive but are diacritical mark sensitive, so Google Trends has different results (eg, for "enfermedad cronica" (chronic illness) than for "enfermedad cronica + enfermedad crónica" [written with Spanish accent]).
Google queries are "broad matched" in the sense that queries such as "great depression" are counted in the calculation of the query index for "depression," which is why we mentioned above that when searching for a term, we should look up what related queries pop out to exclude unwanted terms by placing a dash before them, as required by the Google Trends interrogation syntax. In addition, we have performed a back translation procedure to confirm the accuracy of the translation.
In the related searches, we found terms that did not include the one we were searching for; this is because Google showed other terms that were searched for in the same searching session as the one we were interested in, so we included that term preceded by a hyphen after ours to exclude this spurious concept out of our dataset. In this regard, we devised a set of terms (Table 2) to search in Google Trends; 14 returned no results for their Spanish translation (marked with "no" in the results column in Table 2), and the term "suicide methods" occurred 3 times in previous studies, so duplicates were removed. Therefore, in our final analysis, only 41 terms were included.
We performed an individual search for each of these terms in Spanish, and Google Trends returned 120 values, one for each month of the study period. In each series, there was one term with a value of 100 and the remaining were presented as percentages in reference to this.
It is worth mentioning that Google Trends data were computed using a sampling method, and therefore, the results vary within minutes.

Group Suicide Rate Variables Collected by the Spanish National Statistics Institute
The variables that we used for correlations are the absolute actual suicide rates of Spain (around 4000 deaths per year) reported by the INE, the official organization in Spain that collects statistics on demography, economy, and Spanish society. We have obtained this information through the National Epidemiology Center, which is part of the Instituto de Salud Carlos III, a public research center of the Government of Spain. This information was segregated into totaled data for men and women; Google Trends data were not segregated in this manner.
The period that collected data for was from January 2004 to December 2013, which is, as mentioned earlier, the maximum period covered by both data sources, Google Trends and the INE, at the time of study.

Statistical Analysis
Data were analyzed using the statistical package IBM SPSS Statistics, version 22 (IBM Corporation). The Pearson correlation coefficient was used to assess a possible monthly correlation between suicidal rates and Google Trends RSV data for the search terms that we defined. Next, we performed a multiple linear regression analysis to propose an explanatory-predictive model of the variance of the suicide rates variable.

Results of Objective 1: Correlation Between Suicidal Rates and Internet Searches
With regard to the first objective of our study, the values for correlation between suicide-related terms and suicide rates for Spain from Google Trends data are shown in Table 3 after calculating the Pearson correlation coefficient. We centered moderate or superior results of correlation values according to Evans' study [37], as detailed in Table 4, with a significance value P<.05 (in italics). Since the study terms are Spanish translations of ones already studied in English, Japanese, and Mandarin in the mentioned studies, we devised a Reference column indicating the previous study and a Correlation column to indicate the presence of a correlation according to the original study, with values yes, insufficient, or no.
A linear regression analysis (steps forward) was performed; predictors included all variables (search terms) that demonstrated a significant correlation with previous suicide rates collected by the INE and had an r>0.2. These are the terms in Table 3 that have at least one P value with a significant correlation in the men, women, or total columns. Table 4 presents the explanatory-predictive model. Overall, the model predicts a significant percentage of variance (adjusted r 2 =0.387) of the suicide variable. The term "unemployment" translated as "paro" in Spanish has a high beta value and a positive sign, whereas the term "unemployment" translated as "desempleo" has a lower and negative value. This may seem contradictory because both terms are, a priori, synonyms. However, searches of the term "desempleo" could be carried out, in greater proportion, by people seeking information related to the official term "unemployment benefits and aid" offered by the Spanish Government, while the term "paro," which is used more colloquially, may be associated with searches carried out by people who are suffering due to "unemployment." This may explain why searches for that term are positively associated with the incidence of suicide committed in Spain between 2004 and 2013.
Regarding the beta value for "headache," which has a high and negative correlation with the variable "incidence of suicide," it could be argued that people who search for headache (a condition that can be associated with a wide variety of medical conditions) do so with an intent of self-care, which is contrary to the intention of committing suicide.

Results of Objective 2: Differences Between Men and Women
With regard to the second objective of our work, we found correlations between the terms of study and suicide rates between women and men (Table 5). To describe the strength of the correlation between our variables, we have used the interpretation by Evans [37]; as it can be seen in absolute terms, there is an important difference between the correlation of men and women.
The significant difference between male and female correlations can be explained by women's use the internet for searching for health and lifestyle information. In contrast, men tend to focus on information about investment, purchase, and personal interests [40]. Moreover, this would be consistent with the idea that women have higher emotional intelligence and more communication skills than men [41].

Correlation Between Suicidal Rates and Internet Searches
It is not clear whether the information found on the internet contributes to the promotion of suicide and inspires suicidal thoughts or reduces the risk of suicidal behavior. The causal relationship between suicide and the use of the internet to search for topics related to self-harm or suicide is difficult to prove; however, the results of our study suggest a significant correlation for a number of the search terms that we have studied. This is consistent with previous studies that have been outlined throughout this paper that mostly state that there is indeed some association between certain searches and social phenomena in the economics, health, and other sectors.
Suicide rates in Spain for the 2004-2013 period were examined for their association with search volume on Google for 41 suicide-related searches already tested in scientific studies in other countries and languages. For the general population, suicide rates in Spain were positively associated with the search volume (r>0.2) for 7 terms and negatively associated with the search volume for 1 of the 41 terms. Our interpretation of the results is that they corroborate the hypothesis that certain searches on Google may serve as an indicator of a country's suicide rate, perhaps even of its social well-being.
The negative correlations that we would call "protective" (for searches aimed at finding a solution to the problem) are interesting; the 41 searches in the original studies were supposed to be "risk related" in relation to suicide incidence, but one of them, Stock Market (r=-0.231; P=.006), correlated negatively along with some others that may have a "protective" significance (Social welfare, Religious belief, etc). As a preliminary explanation, we believe that this is due to the social and cultural translocation of the search terms, eg, in Spain unlike Taiwan-where the previous study for the search Stock Market was performed-only wealthy people are concerned about the topic.
In the case of Drunkenness (r=0.211; P=.01) versus the negative correlation for Alcohol (r=-0.113; P=.11), the latter could be interpreted as a protective search to find a solution to the problem, while the former may be used in a leisurely way (ie, without a problematic consciousness). This led us to an important point: Google Trends includes data from subjects with suicidal behavior searching in Google in addition to searches by other people concerned about the issue. We called these two perspectives as "first person" and "third person." We then realized that Google Trends data includes "first person" searches from subjects with suicidal ideation and "third person" searches from their relatives, social surroundings, and institutions; therefore, it is crucial to try to segregate one from the other for future studies. Perhaps, this can be done with the help of linguistics differentiating the denotative aspects of words from their connotative aspects.

Comparison With Prior Work
The term that correlates more strongly with the overall rate of suicide is Allergy (r>0.5 and P<.001), which is consistent with other studies linking depression and allergy [42,43].
However, the overlap between the terms that correlate in our study and those that correlate in other studies is only about half. This could be due to cultural differences between the regions of the study subjects. It could also be due to semantic changes lost (or gained) in translation: Although the studies that we used to build our research were written in English, the original language of the study was Japanese or Mandarin Chinese in several cases, which resulted in two nested translations in this study.
Comparing our results for Spain with some of the results from a study by Yang et al [33] for Taiwan Another reason for these disparities might be deficiencies in the study methodology since using Google Trends as a diagnostic indicator of a society's well-being is still fairly new.
Other interesting evidence our study demonstrates is that well-known risk factors (eg, depression) and explicit searches (eg, suicide) are not correlated with suicide rates; this could be interpreted as follows: the better the knowledge of the risk situation, the less likely it is that this risk of suicide will materialize.

Differences Between Men and Women
Although there is no gender segregation in Google Trends RSV data, as Spanish suicide rates from the INE are segregated by gender, we were able to find differing gender-based correlations: We have found 5 terms that correlate for suicide rates among men and 12 terms in the case of women. Search terms that correlate with suicide rates of women are consistent with previous studies, showing that the incidence of depression is higher in women than in men [44].
In short, we have obtained more than twice the correlations between suicide rates for women compared with those obtained for men. We understand that this is due to the fact that patterns of internet usage among women are more oriented toward searches on health or lifestyle [40], which is also very much in line with the idea that women have more emotional intelligence than men [41]. This would explain why their suicide rate is much lower than that of men in Spain.

Limitations
Owing to the limitations of evidence, we cannot actually predict increases of suicidal mortality using web search data. Rather, we undertook a preliminary investigation using the entire available dataset to establish a statistical association between search term usage and actual suicides in Spain. Further studies should compute time-lagged correlations between Google searches and suicides to help prevent suicide-related deaths.

Social Applications
The practical implication of our results are as follows: It is desirable that competent authorities establish agreements with Google to facilitate suicide prevention by monitoring searches in Google for any of the terms that have been shown to correlate with suicide statistics and other terms that are proven to be significant in future studies.
We also hope that our research will help design and maintain websites that provide better education for suicide prevention, focusing on the treatment of depression and management of labor or emotional problems, as these fields show greater explanatory-predictive value in the incidence of suicide according to our regression model.

Future Developments
An interesting avenue for future research on suicide-related searches is obtaining data from large social networks such as Facebook or Twitter, rather than just metasearch engines like Google.
In addition, the results of this study suggest the feasibility of using the Google search volume to predict other social risk behaviors such as traffic accidents, domestic violence, and bullying, and for the epidemiological monitoring of the evolution of emotional disorders in society. In general, we believe that tracking the search volumes of certain terms (eg, ones related to suicide) represents satisfaction in and well-being of a society. Hence, there may even be an application to the field of politics.

Other Considerations
We want to point out some interesting facts that we have come across in our research and that we consider to be significant in correctly interpreting the world of big data and metasearch engines.
First, as mentioned earlier, it is interesting to note that despite the fact that more than 10,000 scientific papers used or mentioned the Google Trends service, according to Google Scholar, we did not find any mathematical formulation of how Google Trends operationalizes the values that it returns, which is the reason why we have developed it ourselves (Figure 1).
Another fact that seems significant is the case of GISI [20] or the Google Price Index (GPI), which are Google initiatives from 2010, that disappeared despite evidence of good results for forecasting phenomena. According to comments on web-based forums, Google chief economist, Dr Hal Varian, said that GPI was never intended be a project or public source of data; it was simply an internal Google project made visible by the press [45].
In any case, the opportunities and risks of using information from internet metasearches are yet to be determined; with this work, we hope to have contributed some clarity to this field of study.