This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.
Epidemiological studies on influenza have focused mostly on enhancing vaccination coverage or promoting personal hygiene behavior. Few studies have investigated potential effects of personal health behaviors and social contacts on the risk of getting influenza-like illness (ILI).
Taking advantage of an online participatory cohort, this study aimed to estimate the increased risk of getting ILI after contact with infected persons and examine how personal health behaviors, weather, and air pollution affect the probability of getting ILI.
A Web-based platform was designed for participants to record daily health behaviors and social contacts during the influenza season of October 1, 2015 to March 31, 2016, in Taiwan. Data on sleep, diet, physical activity, self-reported ILI, and contact with infected persons were retrieved from the diaries. Measurements of weather and air pollutants were used for calculating environmental exposure levels for the participants. We fitted a mixed-effects logistic regression model to the daily measurements of the diary keepers to estimate the effects of these variables on the risk of getting ILI.
During the influenza season, 160 participants provided 14,317 health diaries and recorded 124,222 face-to-face contacts. The model estimated odds ratio of getting ILI was 1.87 (95% CI 1.40-2.50) when a person had contact with others having ILI in the previous 3 days. Longer duration of physical exercise and eating more fruits, beans, and dairy products were associated with lower risk of getting ILI. However, staying up late was linked to an elevated risk of getting ILI. Higher variation of ambient temperature and worse air quality were associated with increased risk of developing ILI.
Developing a healthier lifestyle, avoiding contact with persons having ILI symptoms, and staying alert with respect to temperature changes and air quality can reduce the risk of getting ILI.
Seasonal influenza epidemics cause a high disease burden, including direct costs of health services and households and indirect costs due to productivity losses worldwide every year [
In this study, we retrieved data from an online diary platform called ClickDiary [
This study was approved by the Institutional Review Board on Humanities and Social Science Research, Academia Sinica (AS-IRB-HS 02-13022). The diary data for analysis were stripped of personal identification information, which was replaced with a serial number to protect participants’ privacy.
We used a Web-based platform named “ClickDiary” [
The contact diary was designed for collecting information on participants’ daily social contact. The definition of contact used was stricter than in other contact diary studies [
The participants provided demographic information at sign-up, including age, gender, place of residence, marital status, and type of current job. The program also collected participants’ Big Five personality traits (openness, conscientiousness, extraversion, agreeableness, and neuroticism) [
We created a cohort of 160 participants who each recorded the two diaries at least 10 days a month for more than 2 months during the influenza season from October 1, 2015, to March 31, 2016. Some participants recorded their health diary most of the times, but their contact diary only occasionally, because it was more time-consuming. Although the participants were required to input in their diaries twice a week, we found that more than half of the participants in the cohort had recorded entries at least 3 times a week, and the average was 4 times a week. There are two major reasons why we did not include all data across the entire period. The first one is that our data showed that the overall self-reported ILI rate was low in the nonflu season. To focus on identifying the risk factors for ILI transmission, we restricted the studied period to the flu season. The other reason is that, for the first 2014-2015 flu season, we encountered a logistical problem for issuing the vouchers from mid-December 2014 to January 2015. This caused the participation rate to fall during that period, and the number of diaries was too small to conduct the analysis.
We introduce a notation to describe the variables for use in the statistical analysis in the following. When the
To understand how environmental factors affect influenza transmission after controlling for health behaviors and social contacts, we included weather and air pollution data from ambient air quality monitoring stations maintained by Taiwan’s Environmental Protection Administration, excluding traffic, industrial, and background stations [
In addition to the influence of contact with infected persons and self-reported ILI status in the past 3 days, as well as age and gender, we first applied logistic regression models to identify health behaviors associated with the probability of self-reported ILI according to the following equation:
Specifically, we fit the logistic regression model for i=1,2,…,160 and j=1,2,…, Di, where Gi=1 indicates the subject is male, Ai is age in years, and Di is the number of diary entries the
The explanatory variables were selected into the model in two stages. At the first stage, we considered personal risk factors, including age, gender, contact with persons with ILI or not, average portions of different kinds of foods in the past 3 days or in the past 7 days, staying up late, average number of hours of sleep, sleeping quality, average mood scores, and amount of exercise in the past 3 days or in the past 7 days. The likelihood ratio test was used to select important variables. We then retain the identified influential health behavior variables in the logistic regression models. In the second stage, we continue by identifying influential weather including temperature and humidity and air pollution variables, including PM2.5, SO2, O3, and CO using a stepwise approach. Each variable was computed in two temporal windows. One was for the past 3 days and the other was for the past 7 days. When all potential covariates were fixed in the logistic regression model equation, we further added a random component to the intercept for modeling subject-to-subject variation. Because the influence on the influenza risk of a reported portion of food intake may have been different among the participants, we also added random components to the coefficients of the chosen food items in the regression equation. Finally, we assume that the repeated records of each participant are correlated in the model. All the added random components were assumed to be normally distributed with mean 0 and constant variance. Because each participant provided self-reported ILI status repeatedly during the study period, we further assumed that a pair of responses of a subject had the correlation according to the following equation;
The mean, minimum, and maximum number of days on the participants recorded entries in the health diary were 89, 29, and 183 days, respectively, and for recording the face-to-face contact diary, the same statistics were 91, 6, and 183 during the study period of 183 days, whereas for days with an entry in either diary, the same statistics were 99, 29, and 183 (
A total of 160 participants were included in this study (
After the first stage of selecting influential variables from using the logistic regression models, there were 12 variables retained in the final model. The descriptive statistics of the identified variables are listed in
Descriptive statistics of number of days when participants recorded health diary and face-to-face contact diary, from October 1, 2015 to March 31, 2016.
Type of diary | Number of days | |||||
Minimum | 25% | 50% | 75% | Maximum | Mean | |
Health diary | 29 | 63 | 76 | 113 | 183 | 89 |
Contact diary | 6 | 64 | 71 | 116 | 183 | 91 |
Either diary | 29 | 66 | 91 | 128 | 183 | 99 |
Demographic summary of 160 participants.
Age group | Gender | All | |
Male, n (%) | Female, n (%) | n (%) | |
20-30 | 14 (36.8) | 45 (36.9) | 59 (36.9) |
31-40 | 10 (26.3) | 30 (24.6) | 40 (25.0) |
41-50 | 6 (15.8) | 20 (16.4) | 26 (16.3) |
51-60 | 2 (5.3) | 19 (15.6) | 21 (13.1) |
61-70 | 6 (15.8) | 8 (6.6) | 14 (8.8) |
All | 38 (100.0) | 122 (100.0) | 160 (100.0) |
Number and percentage of influenza-like illness (ILI) between participants and their contact persons.
Participants had ILI in past 3 days | Contact persons had ILI in past 3 days | |
No, n (%) | Yes, n (%) | |
No | 10,974 (89.58) | 824 (6.72) |
Yes | 322 (2.63) | 131 (1.07) |
Temporal trends of daily and weekly incidence rate of self-report influenza-like illness (ILI) and weekly outpatient ILI admission rate.
Descriptive statistics of selected variables for fitting the models. Q1 and Q3 represent the first and third quartiles.
Variables | Descriptive statistics | ||||||
Minimum | Q1 | Median | Mean | Q3 | Maximum | ||
Vegetables | 0 | 1 | 2 | 1.75 | 2 | 4 | |
Fruits | 0 | 0.5 | 1 | 1.28 | 2 | 4 | |
Cereals | 0 | 1.75 | 2.5 | 2.55 | 3 | 8 | |
Beans and pulses | 0 | 0 | 0.5 | 0.61 | 1 | 4 | |
Meats and eggs | 0 | 1 | 2 | 2.34 | 3.17 | 14.5 | |
Dairy products | 0 | 0 | 0.25 | 0.4 | 0.67 | 4 | |
Sleep time (hours) | 0.5 | 6.5 | 7.33 | 7.38 | 8.25 | 14.25 | |
Exercise (min) | 0 | 0 | 15.5 | 21.44 | 30.5 | 120 | |
Temperature deviation (°C)a | 0.12 | 1.38 | 1.92 | 2.16 | 2.76 | 5.94 | |
Maximum PM2.5 (μg/m3) | 2.04 | 19.25 | 27.29 | 30.01 | 38.02 | 88.61 | |
Maximum 8-hour moving average of O3 (ppb) | 5.69 | 28.17 | 33.76 | 34.98 | 40.78 | 76.22 |
aOnly temperature deviation represented values in the past 7 days; other variables represented values in the past 3 days.
Estimates of the odd ratios in the mixed-effects logistic regression models. IQR: interquartile range; OR: odds ratio; ILI: influenza-like illness.
Variables | IQR | OR (95% CI) | |
Free of ILI and contact with infected personsa | 1.87 (1.40-2.50) | ||
Self-reporting ILI and no contact with infected personsa | 55.79 (45.26-68.77) | ||
Self-reporting ILI and contact with infected personsa | 59.97 (44.32-81.14) | ||
Age >60b | 0.06 (0.0005-8.23) | ||
Malec | 0.30 (0.05-1.76) | ||
Late bedtimed | 1.43 (1.11-1.84) | ||
Vegetables | 1.0 | 0.92 (0.64-1.33) | |
Fruits | 1.5 | 0.37 (0.19-0.75) | |
Cereals | 1.25 | 0.99 (0.70-1.40) | |
Beans and pulses | 1.0 | 0.42 (0.20-0.87) | |
Meats and eggs | 2.17 | 1.09 (0.67-1.77) | |
Dairy products | 0.67 | 0.31 (0.14-0.69) | |
Sleep duration (h) | 1.67 | 0.97 (0.84-1.12) | |
Exercise time | 30.5 | 0.73 (0.63-0.84) | |
Temperature deviation | 1.37 | 1.25 (1.13-1.39) | |
log (PM2.5)e | 0.68 | 1.13 (0.99-1.30) | |
O3 | 12.66 | 1.33 (1.20-1.49) | |
log (PM2.5) and O3 | 0.68 and 12.66 | 1.51 (1.29-1.76) |
aReference group: Free of ILI and no contact with infected persons.
bReference group: Age≤60.
cReference group: Female.
dReference group: Did not have late bedtime.
ePM2.5: fine particulate matter.
The fixed-effects logistic regression models identified several influential variables associated with the probability of participants’ reporting ILI symptoms. The variables include reporting having had any of the following during the past 3 days: contacts with infected persons; being free of ILI; staying up late; average exercise time; and average consumption of fruits, beans and pulses, and dairy products. The most influential environmental variables identified were SD of daily mean temperature in the past 7 days, mean daily maximum 8-hour moving average ozone, and mean daily PM2.5 concentrations in the past 3 days. The estimated coefficients of the model shown in
Variation of daily mean temperature in the past 7 days also increased the risk of infection. The estimated OR was 1.25 (95% CI 1.13-1.39) for comparing an IQR of 1.37°C in SD of the daily temperatures in the past 7 days. Participants exposed to higher ozone concentrations in the past 3 days had a higher chance of reporting ILI symptoms. The estimated OR was 1.33 (95% CI 1.20-1.49) for an increased IQR of 12.7 ppm in the daily maximum 8-hour moving average concentration of ozone. The concentration of PM2.5 on a log scale was found to be only marginally associated with response from the fitted model. The estimated OR was 1.13 (95% CI 0.99-1.30) for comparing an IQR of 18.8 μg/m3 in daily average concentration of PM2.5. Because the two variables of ozone and PM2.5 had a correlation of .42, we compared the third quartile concentrations of both ozone and PM2.5 against their first quartile levels. The estimated OR increased to 1.5 (95% CI 1.29-1.76). When the participants reported ILI symptoms in the past 3 days, we expected large influence on reporting ILI on the current day even if they had no contact with infected persons. The model estimated OR was 55.8 (95% CI 45.3-68.8).
We removed the insignificant variables of vegetables, cereals, meats and eggs, and sleep time from the model to check whether collinearity among some of the explanatory variables had any influence on the estimated coefficients. The results of the smaller model indicated that the original significance estimates changed only a little. Consumption of vegetables and fruits had a strong correlation of .51. However, we could not find significant association of vegetables with the response when replacing fruits with vegetables even in the smaller model.
We believe that the random effects for modeling participant-to-participant variation and food intake among different people reduced bias of the estimates caused by unobserved factors of the participants. To further examine any selection bias resulting from the fact that the number of weekly entries per participant varied, we conducted one sensitivity analysis to check the robustness of our findings. We randomly selected at most three records per week from each diary keeper to form a subdataset for fitting the same mixed-effects model. The previously mentioned procedure was repeated 100 times to produce 100 sets of estimated coefficients. Then, we calculate the pooled estimate of each coefficient and standard errors of these pooled estimates (
We also tried another sensitivity analysis to include those participants filling in the diaries less than 10 days per month. By relaxing the inclusion criterion, we included an additional 42 participants in the model. The whole model was then rerun with 202 participants. The results (
The online diary-based approach in this study was used to collect not only ILI symptoms but also daily health behaviors and participants’ social networks. This innovative approach can reduce recall bias compared with weekly or monthly surveys. Although the presence of significant risk factors for ILI has been revealed in different studies, few have quantitatively estimated the risk levels associated with different factors. In this study, we had the opportunity to collect all those risk factors from online diaries during an influenza season. We have used these empirical observations to reveal several risk reduction–related health behaviors such as avoiding contact with persons with ILI, sleeping earlier, keeping a good diet, exercising more, and being aware of environmental temperature and air pollution. Our study demonstrated a wide spectrum of ILI risks at the personal, contact, and environmental levels.
In traditional approaches to studying interpersonal influenza transmission, the main focus was identifying people with ILI symptoms or confirmed ILI in a hospital setting and then following up with the potential transmission within the household or in schools [
The household cohort can provide insights to capture the dynamics of influenza transmission after identifying an index case in a household [
In this study, we were able to quantify the risk of developing ILI from having been in contact with persons with ILI in the past 3 days. The estimated OR of 1.87 in our model is quite similar to a household transmission study conducted in France, which also reported that the hazard ratio (HR) of increased risk of influenza transmission in preschool contacts was 1.85 compared with school age and adult contacts [
We also found that having a later bedtime in the past 3 days is a significant risk factor for developing ILI. In children, staying up late has previously been found to be associated with poorer quality of life and overall health [
Diet and physical exercise are also very important health behaviors to enhance immunity and reduce the chance of influenza infection. Although we have considered many types of foods in the model, we finally found that fruits, beans and pulses, and dairy products are associated with lower ILI risk. In the literature, one study found that anthocyanins from fruit extracts inhibited influenza virus adsorption into cells and also virus release from infected cells [
In addition to personal risk factors, environmental factors such as temperature variation, PM2.5, and O3 exposure are also associated with immunity, virus transmission, and replication from the literature. In one guinea pig study, cold and dry conditions were best for influenza transmission between hosts [
Positive correlations between PM2.5 and ILI have been found in many studies in Beijing, China [
One social network study [
There are several limitations in this study. The first one is a lack of laboratory samples to confirm whether the participants or the contact alters were in fact influenza infected or not. From a practical viewpoint, it is not possible to know who transmitted the influenza virus among the general public. Thus, we treat this uncertainty as a random effect for each participant. Furthermore, we used a relatively strict definition of ILI in our questionnaire. We treated participants as getting ILI only when they selected “definitely having ILI symptoms” and clicked specific symptoms in the symptom list. Using this definition, our data showed a good match with the epi-curve of ILI from the nationwide outpatient surveillance. The second limitation is the sample size and representativeness. Because of the nature of our study design, the diary-based follow-up needs patience and persistence to record the health and contact diary for 6 months. It is difficult to keep a large number of participants in a study of this kind for such a long time. Also, volunteer-based online surveys can never claim to be representative of the general population. Due to the limitation of small sample size, we cannot generalize our findings to the general population. In particular, we have found that the participants in this study tended to be young, and there was a large proportion of female participants (76.3%, 122/160). Therefore, we included age and sex in the model for adjustment. In addition, we also incorporated other variables such as geographic area into the model, with no significant difference.
The third limitation is lack of data on protective behaviors such as wearing a facial mask and vaccination status during the study period. In fact, we collected the vaccination status before and after influenza season. However, we did not know the exact date of vaccination. Therefore, it was not available to measure the effect of this diary-based study design. The fourth limitation is lack of exercise intensity in this diary. We only asked participants to record the duration of exercise. From the current findings, the exercise time should be longer than 30 min to show reduced risk of getting ILI. The fifth limitation is related to exposure estimation. In fact, we did not know exactly where the participants were located. They only reported the township where they lived. Therefore, we were only able to use a station in the corresponding township or the closest station to it. The concentrations of PM2.5 and ozone could be treated only as environmental exposure, not as a measure of personal exposure.
In conclusion, our study shows that keeping a healthier lifestyle, including having a nutritious diet, sleeping earlier, and doing longer physical exercise is associated with a lower risk of getting ILI. Self-protection and avoiding contact with infected persons, as well as keeping alert to temperature changes and air quality are also linked to lower risk of getting ILI.
Comparison of estimated coefficients using the whole dataset and pooled estimates of 100 sampled subdatasets.
Sensitivity analysis on model estimation by including those participants filling in the diaries less than 10 days per month.
Estimation of ILI risk by stratifying two types of contacts with infected family members and nonrelatives.
influenza-like illness
hazard ratio
interquartile range
odds ratio
particulate matter
This research was supported by a grant from the Academia Sinica, Taiwan, R.O.C. (AS-103-TP-C03). The authors would like to thank Ms Jie-Yu Sung for the administrative support.
None declared.