Integrating Smart Health in the US Health Care System: Infodemiology Study of Asthma Monitoring in the Google Era

Background With the internet’s penetration and use constantly expanding, this vast amount of information can be employed in order to better assess issues in the US health care system. Google Trends, a popular tool in big data analytics, has been widely used in the past to examine interest in various medical and health-related topics and has shown great potential in forecastings, predictions, and nowcastings. As empirical relationships between online queries and human behavior have been shown to exist, a new opportunity to explore the behavior toward asthma—a common respiratory disease—is present. Objective This study aimed at forecasting the online behavior toward asthma and examined the correlations between queries and reported cases in order to explore the possibility of nowcasting asthma prevalence in the United States using online search traffic data. Methods Applying Holt-Winters exponential smoothing to Google Trends time series from 2004 to 2015 for the term “asthma,” forecasts for online queries at state and national levels are estimated from 2016 to 2020 and validated against available Google query data from January 2016 to June 2017. Correlations among yearly Google queries and between Google queries and reported asthma cases are examined. Results Our analysis shows that search queries exhibit seasonality within each year and the relationships between each 2 years’ queries are statistically significant (P<.05). Estimated forecasting models for a 5-year period (2016 through 2020) for Google queries are robust and validated against available data from January 2016 to June 2017. Significant correlations were found between (1) online queries and National Health Interview Survey lifetime asthma (r=–.82, P=.001) and current asthma (r=–.77, P=.004) rates from 2004 to 2015 and (2) between online queries and Behavioral Risk Factor Surveillance System lifetime (r=–.78, P=.003) and current asthma (r=–.79, P=.002) rates from 2004 to 2014. The correlations are negative, but lag analysis to identify the period of response cannot be employed until short-interval data on asthma prevalence are made available. Conclusions Online behavior toward asthma can be accurately predicted, and significant correlations between online queries and reported cases exist. This method of forecasting Google queries can be used by health care officials to nowcast asthma prevalence by city, state, or nationally, subject to future availability of daily, weekly, or monthly data on reported cases. This method could therefore be used for improved monitoring and assessment of the needs surrounding the current population of patients with asthma.


Introduction
Health informatics is the field where information technology, computer science, social sciences, and health care meet [1]. Recently, with the use of big data (ie, large data volumes characterized by high speed and wide dataset variety [2][3][4]) being all the more applied in research in general, health informatics provides fertile ground for big data applications.
According to Gu et al [5], big data health care research consists of 3 research stages: disease, life and health, and nursing. Focus is being given to various aspects of diseases, technology, and health care services in areas such as epidemics, data mining, machine learning, and customized service [5]. Big data is being increasingly integrated in health care informatics [5][6] and has been used in the past in smart city management.
Over the last few years during the integration of the health pillar in smart cities, where big data is being continuously gathered and analyzed [7], the concept of smart health has been rising [8][9][10]. Smart health as a concept is derived from the intersection of medical informatics, public health, and business, where large volumes of social media data, payer-provider big data, genomic-driven big data, and biomedical data are being used for the monitoring and evaluation of patients' conditions [10]. As life expectancy increases, so does the cost of health care, and thus innovative methods are required to achieve improved cost-effective quality services. The use of big data in smart health can assist in P4 medicine (preventive, participatory, predictive, and personalized) [8], in the detection, prediction, and prevention of diseases [5], and in the health industry in general [10] while also taking into account the cost, data sources and quality, and population [4].
What has been of notable popularity in big data analytics is the analysis of online search queries [11][12], mainly using Google Trends [13], a popular open tool that has been widely integrated in scientific research over the course of the past decade, mainly focused on health-related topics [6]. Examples include analysis of online interest in multiple sclerosis [14], epilepsy [15][16], silicosis [17], dementia [18], urinary tract infection [19], Ebola [20], the flu [21][22][23], tobacco and lung cancer [24], epidemics [25][26], and even in illegal drugs such as dabbing [27], krokodil [28], and methamphetamine [29]. This use of big data has formed the cornerstone of a new concept, the science of infodemiology, which uses the vast variety of data available on the internet such as online queries, publications, or posts on blogs and websites for real-time data analysis with the aim of informing public health and public policy, thus providing a viable alternative to the time-consuming traditional methods of gathering health care data such as population surveys and registries. The use of infodemiology data for surveillance purposes is called infoveillance and could potentially allow for more timely and targeted health care interventions [30].
In this study, online queries for the term "asthma" in the United States were analyzed in order to explore the possibility of nowcasting (ie, forecasting the present) asthma prevalence using Google Trends. Asthma was selected because it is a common chronic respiratory disease characterized by exacerbations, also known as asthma attacks; therefore, the reported cases are bound to show seasonality as well as constant interest.
Asthma is a chronic condition characterized by airway inflammation and hyper-responsiveness that causes airways to constrict in response to exercise, infection, exposure to allergens, and occupational exposures [31]. In 2014, it was estimated that approximately 7.4% of the adult US population and 8.6% of US children lived with asthma [32]. During childhood, asthma is more prevalent in males, whereas in adulthood prevalence shifts toward females. Black and multirace people also have a higher prevalence than white people [33][34].
Asthma presents with coughing, wheezing, and chest tightness that seem to be worse during the night and early mornings. These symptoms, along with a family history of asthma or atopic dermatitis, can prompt investigations to confirm an asthma diagnosis. Exacerbation of normal asthma symptoms is more common in patients with uncontrolled asthma or in high-risk patients [35]. Certain types of asthma exacerbations are linked to particular seasons of the year with those caused by pollen and mold being truly seasonal [36]. It has been shown that pediatric patients experience a peak of asthma exacerbations during the fall and spring months [37], whereas adult patients experience a peak of asthma exacerbations at year end [38].
The management of asthma usually involves the use of several inhalers, leading to a rather complicated treatment regime that presents difficulties in terms of patient compliance because it interferes with their daily living activities. Poor compliance can lead to increased morbidity as well as increased cost of treatment [39]. Apart from treatment compliance, another important factor that weighs in the success of the treatment is inhaler technique, as improper inhaler use is linked to poor asthma control. Studies have shown that 33% to 94% of patients do not receive any training regarding proper inhaler technique, which leads to a great number of patients using inhalers incorrectly [40]. Asthma self-management education and personalized advice can improve a patient's asthma control and quality of life, along with reducing asthma exacerbations and hospital admissions [41].
Asthma has several social complications such as limiting patients' activity levels [42], which has an economic impact on the country's health care system. It was estimated that in 2007, medical expenses, missed work and school days, and early deaths due to asthma cost the United States $56 billion [43].
Google Trends data have been previously shown to be valid by many studies [44], and work on the subject has shown the tool's contribution to forecasting [45][46] and analysis of online behavior, provided careful selection of the examined terms [47]. The aim of this paper is to examine if nowcasting asthma prevalence in the United States is possible using online search traffic data.

Methods
Monthly time series from Google Trends for the keyword "asthma" from 2004 to 2015 in the United States and by individual state were used. The data were normalized by Google and downloaded in .csv format on July 7, 2017, between 12:47 and 13:02 for the United States and on July 18 between 14:03 and 14:33 for each of the 50 states and the District of Columbia. The data adjustment procedure is reported by Google as follows [48]: "Search results are proportionate to the time and location of a query: Each data point is divided by the total searches of the geography and time range it represents, to compare relative popularity. Otherwise places with the most search volume would always be ranked highest. The resulting numbers are then scaled on a range of 0 to 100 based on a topic's proportion to all searches on all topics. Different regions that show the same number of searches for a term will not always have the same total search volumes." The seasonality of asthma queries was explored followed by the estimation of the forecasts for the online interest in the term from 2016 through 2020 for the country as well as for each state. The additive method for the Holt-Winters exponential smoothing (using the statistical programming language R) is employed. The Holt-Winters equations [49] can be seen in Figure 1.
In order to further elaborate on the seasonality, the Pearson correlations for Google Trends data for the term "asthma" between each 2 years from 2004 to 2015 in the United States were calculated. Finally, the Pearson correlations between Google queries and the National Health Interview Survey (NHIS) prevalence data [50] from 2004 to 2015 and Behavioral Risk Factor Surveillance System (BRFSS) prevalence data [51] from 2004 to 2014 were examined.
Asthma is not included in the list of diseases with a Centers for Disease Control and Prevention (CDC) surveillance case definition, defined as "a set of uniform criteria used to define a disease for public health surveillance. Surveillance case definitions enable public health officials to classify and count cases consistently across reporting jurisdictions. They provide uniform criteria of national notifiable infectious and non-infectious conditions for reporting purposes" [52]. Thus, nationwide surveys are used to gather information regarding asthma prevalence, including additional information on asthma control, medications, and hospitalizations [53]. The BRFSS is a "state-based, random-digit-dialed telephone survey designed to monitor the prevalence of the major behavioral risks among adults associated with premature morbidity and mortality," and the NHIS is a "multistage probability sample survey designed to solicit health and demographic information about the population, conducted annually with face-to-face interviews in a nationally representative sample of households" [54].
In 2011, the BRFSS changed its weighting methodology in addition to also including mobile phone respondents. Therefore, any comparisons between years before and after 2011 should be carefully interpreted. In this study, no such comparisons are made, as each year's online queries are compared with the respective year's asthma reported cases, thus including no cross-year comparisons. For this study, we used the CDC definition of asthma prevalence, based on affirmative responses to the following NHIS questions: (adults) "Have you ever been told by a doctor or other health professional that you had asthma?" and "Do you still have asthma?" and (children) "Has a doctor or other professional ever told you that [sample child] had asthma?" and "Does [sample child] still have asthma?" [55]. where y x and ŷ x denote the initial series and the forecasts, respectively. The l x , b x , and s x denote the level, the trend, and seasonal estimates for month x, respectively, with m denoting the period of the seasonality (ie, 12 in this case), and h + m = (h-1)mod m +1. The level, trend, and seasonal change smoothing factors are denoted by constants α, β*, and γ, respectively. The estimated values for the coefficients for the level and trend are denoted by a and b, respectively, while the seasonal coefficients are denoted by s 1 ,...,s 12 , for month 1,...,12, respectively.  Table A1.

Online Interest in the United States
Figures 3 and 4 depict the changes in online interest in the term "asthma" for the period 2004 to 2015 and the seasonal changes for each year from 2004 to 2015, respectively. As is evident, the data follow a seasonal trend. All years' data, as presented in Figure 4, follow a similar pattern during a full year, supporting our hypothesis that the seasonality of asthma prevalence in the United States is depicted in online searches.  Table A2.
As is evident, online behavioral changes toward the term "asthma" depict behavior toward said disease. The next steps are to examine if forecasting online interest in the United States is possible and identify existing relationships between online search traffic data and reported asthma cases.     In order to elaborate on the robustness of the forecasting model, the estimated values are validated against the available Google queries for the term "asthma" from January 2016 to June 2017, as is shown in Figure 7. It is evident that the forecasts follow the same curve and well approximate the actual Google Trends data for the aforementioned period.

Forecasting Online Interest in the United States
It is therefore suggested that the online behavior exhibits seasonality and can be predicted. The last step in exploring if nowcasting of asthma prevalence in the United States is possible using Google Trends is to examine the correlations between Google Trends data and reported lifetime and current asthma.

Google Trends Versus Reported Asthma
As shown in Figure 4, each examined year's online interest seems to follow a similar seasonal trend from January to December. To elaborate on the seasonal trend, the Pearson correlations between each 2 years' queries are calculated ( Table  1). The monthly Google Trends data between each 2 years from 2004 to 2015 exhibit high correlations, while all comparisons are statistically significant, with P<.05.    2014  2013  2012  2011  2010  2009  2008  2007  2006  2005  2004   ----------.89  2005   ---------.89  .86  2006   --------. To further explore the relationships between online searches and asthma prevalence in the United States, data on the yearly cases of lifetime and current asthma for all ages from the NHIS prevalence data from 2004 to 2015 [50] and the BRFSS prevalence data [51] from 2004 to 2014 (Table 2)  Although statistically significant, all Pearson correlations are negative, and lag analysis should be employed to identify the time interval of response between asthma online interest and case reporting or vice versa. Although Google Trends data for the term "asthma" in the United States over the examined period are monthly, the data on lifetime and current asthma are yearly; until weekly or monthly data are available, further analysis cannot by done.

Forecasting Online Interest by State
In order to show that the method of nowcasting asthma prevalence in the United States using Google queries is possible, this methodology is applied in each of the 50 states and the District of Columbia and exhibits good forecasting results. Figures 8 to 11 depict the changes in online interest in the term "asthma" from 2004 to 2015 and forecasts from 2016 to 2020 for the 4 most populated states (ie, California, Texas, Florida, and New York), and the graphs for all states can be found in Multimedia Appendix 2, Figures B1-B51. The values of the smoothing parameters α, β*, and γ and the coefficients for each state's forecasts can be found in Multimedia Appendix 1, Tables A3 and A4, respectively. As online behavioral changes can be predicted and data on asthma cases are correlated with online queries, nowcasting of asthma could be possible provided short-interval data (eg, monthly, weekly, or even daily) are available.
According to the results, online interest in Alaska, Nebraska, New Hampshire, Oklahoma, and Tennessee exhibits increasing forecast trends from 2016 to 2020. On the contrary, online interest in Delaware, Kansas, Oregon, and Virginia exhibits decreasing forecast trends from 2016 to 2020. Overall, the states of Arizona, California, Connecticut, Florida, Georgia, Illinois, Indiana, Maryland, Michigan, Missouri, New Jersey, New York, North Carolina, Pennsylvania, Texas, and Washington show high interest in the term "asthma" throughout the examined period, while in Hawaii and Wyoming, interest is low. Virginia is the only state where online interest exhibits very significant variations from 2004 to 2016.
Our study indicates that analysis of online behavior toward asthma by state can assist with nowcasting asthma prevalence. Since search queries and reporting of asthma are shown to correlate in the United States, if short-interval data (eg, weekly or monthly) were made available, a robust nowcasting model could be developed.

Principal Findings
In addressing integration of smart health into smart city management, monitoring of search traffic data could be useful in predictions and nowcastings, as has also been suggested by previous work on the subject. This study shows that online interest can be predicted nationally and by state. Therefore, governments, policy makers, and health care officials have the ability to use these data to better address the responsiveness of the US health care system at national, regional, state, or even city level in order to nowcast asthma prevalence. Google Trends also provides detailed regional US data, and this method can be applied in other countries as well.
Empirical relationships between Google Trends and human behavior have been suggested, therefore nowcasting asthma prevalence in the United States is possible using online search traffic data, subject to availability of daily, weekly, or monthly data. In this study, it was shown that online search traffic data are highly correlated between each 2 years during the examined period and that Google Trends data are correlated with reported cases of lifetime and current asthma in the United States from 2004 to 2015.
After analyzing changes in online interest in the United States over the examined period, the next step was to identify any seasonal similarities between each 2 years' (monthly) search queries. As the hits between each 2 years from 2004 to 2015 on the term "asthma" were highly correlated, the seasonal effect was evident; using Holt-Winters exponential smoothing, 5-year forecasts for online interest in the term from 2016 to 2020 nationally and in each state were estimated. Validated against available data from January 2016 to June 2017, the forecasts were well fitted and accurately approximated the actual Google Trends data for the same period, suggesting seasonal behavioral changes over the course of a year can be accurately predicted using the proposed method. Google Trends data are correlated with reported cases of lifetime and current asthma, and thus nowcasting asthma prevalence in the United States is suggested to be possible using online search traffic data. As the calculated correlations are negative at this point and there is a lag between internet queries and asthma reporting and vice versa, short-interval data (eg, monthly, weekly, and daily-not available at this point) are required in order to identify said lag.

Limitations
This study has limitations. It cannot be assumed that each hit corresponds to an asthma case and vice versa because hits could be also attributed to academic or research reasons or general interest on the subject, and they could be influenced by news reports or social media. Queries related to asthma could be also influenced by factors such as changes of health insurance and weather or environmental conditions that trigger similar symptoms. This is a general limitation when examining online queries, despite the empirical relationships that have been shown to exist between Google Trends and health data.
The sample is not representative, although as internet penetration increases, so does the possibility of higher volumes of online queries being related to asthma cases. Additionally, nowcasting asthma prevalence using online search queries is not possible at this point because the available data on reported lifetime and current asthma are yearly. If monthly, weekly, or daily data on past asthma prevalence were available and the correlations between search traffic data and reported asthma are validated, the possibility of nowcasting asthma could be further explored.
This study has not accounted for state-by-state confounders that could influence search patterns, such as the socioeconomic status and demographics of different states that might be relevant to asthma prevalence, as this exceeds the scope of this paper. The latter, along with the impact of socioeconomic and cultural differences on asthma reporting and online search patterns, are of interest for further investigation. In addition, more search terms related to asthma symptoms such as "breathlessness" and "wheezing" could be included in future research on asthma monitoring in the United States.

Conclusion
The findings of this study support previous work on the subject and highlight the value of online data in health and medical informatics. Google Trends data have been shown to be useful and valuable in the monitoring, surveillance, or prediction of epidemics and outbreaks [20,[25][26]56], as have been various other internet sources such as Twitter [57], medical portals [58], and Baidu [59]. Google queries provide us with the revealed and not the stated user interest contrary to traditional survey methods [60], and the use of Web data will benefit the exploration of behavior in medical issues [61]. Data from traditional sources and big data should be combined in order to take full advantage of all available information [62]. When daily, weekly, or monthly data on reported asthma cases are made available, data from online sources like Google Trends could be used centrally and then applied by state or used by each city or state individually, assisting with the integration of the smart health concept in smart city management.
Internet behavior can be measured by infodemiology metrics as information patterns and population health are related [30]. Surveillance of asthma is mainly assessed through nationwide surveys and interviews, and data on asthma prevalence are only available long after the cases of asthma are reported. Nowcasting Google queries on selected terms related to asthma could assist health officials at both national and state levels to detect any behavioral variations toward the disease, providing time-effective allocation of resources and a more cost-effective approach to asthma assessment. This study suggests a relationship between asthma prevalence and Google Trends data. In the future, analysis of online queries could be valuable in the monitoring and evaluation of the responsiveness of the US health care system to asthma patient admissions and prescription drug needs, as well as assisting with the implementation of targeted health interventions and campaigns during periods when increased asthma admissions are predicted.

Conflicts of Interest
None declared.

Multimedia Appendix 1
State data tables.