JMIR Publications

JMIR Public Health and Surveillance

A multidisciplinary journal that focuses on public health and technology, public health informatics, mass media campaigns, surveillance, participatory epidemiology, and innovation in public health practice and research.


Journal Description

JMIR Public Health & Surveillance (JPH, Editor-in-chief: Patrick Sullivan, Emory University/Rollins School of Public Health) is a new sister journal of the Journal of Medical Internet Research (JMIR), the top cited journal in health informatics (Impact Factor 2015: 4.532). JPH is a multidisciplinary journal that focuses on innovation and technology in public health, and includes topics like health communication, public health informatics, surveillance, participatory epidemiology, infodemiology and infoveillance, digital disease detection, digital public health interventions, mass media/social media campaigns, and emerging population health analysis systems and tools.

We publish regular articles, reviews, protocols/system descriptions and viewpoint papers on all aspects of public health, with a focus on innovation and technology in public health.

Among other innovations, JPH is also dedicated to support rapid open data sharing and rapid open access to surveillance and outbreak data. As one of the novel features we plan to publish rapid or even real-time surveillance reports and open data. The methods and description of the surveillance system may be peer-reviewed and published only once in detail, in a  "baseline report" (in a JMIR Res Protoc or a JMIR Public Health & Surveill paper), and authors then have the possibility to publish data and reports in frequent intervals rapidly and with only minimal additional peer-review (we call this article type "Rapid Surveillance Reports"). JMIR Publications may even work with authors/researchers and developers of selected surveillance systems on APIs for semi-automated reports (e.g. weekly reports to be automatically published in JPHS and indexed in PubMed, based on data-feeds from surveillance systems and minmal narratives and abstracts).

Furthermore, duing epidemics and public health emergencies, submissions with critical data will be processed with expedited peer-review to enable publication within days or even in real-time.

We also publish descriptions of open data resources and open source software. Where possible, we can and want to publish or even host the actual software or dataset on the journal website.


Recent Articles:

  • WebCAAFE. Image sourced and copyright owned by authors.

    Qualitative Analysis of Cognitive Interviews With School Children: A Web-Based Food Intake Questionnaire


    Background: The use of computers to administer dietary assessment questionnaires has shown potential, particularly due to the variety of interactive features that can attract and sustain children’s attention. Cognitive interviews can help researchers to gain insights into how children understand and elaborate their response processes in this type of questionnaire. Objective: To present the cognitive interview results of children who answered the WebCAAFE, a Web-based questionnaire, to obtain an in-depth understanding of children’s response processes. Methods: Cognitive interviews were conducted with children (using a pretested interview script). Analyses were carried out using thematic analysis within a grounded theory framework of inductive coding. Results: A total of 40 children participated in the study, and 4 themes were identified: (1) the meaning of words, (2) understanding instructions, (3) ways to resolve possible problems, and (4) suggestions for improving the questionnaire. Most children understood questions that assessed nutritional intake over the past 24 hours, although the structure of the questionnaire designed to facilitate recall of dietary intake was not always fully understood. Younger children (7 and 8 years old) had more difficulty relating the food images to mixed dishes and foods eaten with bread (eg, jam, cheese). Children were able to provide suggestions for improving future versions of the questionnaire. Conclusions: More attention should be paid to children aged 8 years or below, as they had the greatest difficulty completing the WebCAAFE.

  • Anti-nuclear power plant rally on 19 September 2011 at the Meiji Shrine complex in Tokyo.. By 保守 - Own work, Public Domain,

    Estimating the Duration of Public Concern After the Fukushima Dai-ichi Nuclear Power Station Accident From the Occurrence of Radiation Exposure-Related Terms...


    Background: After the Fukushima Dai-ichi Nuclear Power Station accident in Japan on March 11, 2011, a large number of comments, both positive and negative, were posted on social media. Objective: The objective of this study was to clarify the characteristics of the trend in the number of tweets posted on Twitter, and to estimate how long public concern regarding the accident continued. We surveyed the attenuation period of the first term occurrence related to radiation exposure as a surrogate endpoint for the duration of concern. Methods: We retrieved 18,891,284 tweets from Twitter data between March 11, 2011 and March 10, 2012, containing 143 variables in Japanese. We selected radiation, radioactive, Sievert (Sv), Becquerel (Bq), and gray (Gy) as keywords to estimate the attenuation period of public concern regarding radiation exposure. These data, formatted as comma-separated values, were transferred into a Statistical Analysis System (SAS) dataset for analysis, and survival analysis methodology was followed using the SAS LIFETEST procedure. This study was approved by the institutional review board of Hokkaido University and informed consent was waived. Results: A Kaplan-Meier curve was used to show the rate of Twitter users posting a message after the accident that included one or more of the keywords. The term Sv occurred in tweets up to one year after the first tweet. Among the Twitter users studied, 75.32% (880,108/1,168,542) tweeted the word radioactive and 9.20% (107,522/1,168,542) tweeted the term Sv. The first reduction was observed within the first 7 days after March 11, 2011. The means and standard errors (SEs) of the duration from the first tweet on March 11, 2011 were 31.9 days (SE 0.096) for radioactive and 300.6 days (SE 0.181) for Sv. These keywords were still being used at the end of the study period. The mean attenuation period for radioactive was one month, and approximately one year for radiation and radiation units. The difference in mean duration between the keywords was attributed to the effect of mass media. Regularly posted messages, such as daily radiation dose reports, were relatively easy to detect from their time and formatted contents. The survival estimation indicated that public concern about the nuclear power plant accident remained after one year. Conclusions: Although the simple plot of the number of tweets did not show clear results, we estimated the mean attenuation period as approximately one month for the keyword radioactive, and found that the keywords were still being used in posts at the end of the study period. Further research is required to quantify the effect of other phrases in social media data. The results of this exploratory study should advance progress in influencing and quantifying the communication of risk.

  • Narrative graph of children and exemptions context. Image sourced and copyright owned by authors.

    “Mommy Blogs” and the Vaccination Exemption Narrative: Results From A Machine-Learning Approach for Story Aggregation on Parenting Social Media Sites


    Background: Social media offer an unprecedented opportunity to explore how people talk about health care at a very large scale. Numerous studies have shown the importance of websites with user forums for people seeking information related to health. Parents turn to some of these sites, colloquially referred to as “mommy blogs,” to share concerns about children’s health care, including vaccination. Although substantial work has considered the role of social media, particularly Twitter, in discussions of vaccination and other health care–related issues, there has been little work on describing the underlying structure of these discussions and the role of persuasive storytelling, particularly on sites with no limits on post length. Understanding the role of persuasive storytelling at Internet scale provides useful insight into how people discuss vaccinations, including exemption-seeking behavior, which has been tied to a recent diminution of herd immunity in some communities. Objective: To develop an automated and scalable machine-learning method for story aggregation on social media sites dedicated to discussions of parenting. We wanted to discover the aggregate narrative frameworks to which individuals, through their exchange of experiences and commentary, contribute over time in a particular topic domain. We also wanted to characterize temporal trends in these narrative frameworks on the sites over the study period. Methods: To ensure that our data capture long-term discussions and not short-term reactions to recent events, we developed a dataset of 1.99 million posts contributed by 40,056 users and viewed 20.12 million times indexed from 2 parenting sites over a period of 105 months. Using probabilistic methods, we determined the topics of discussion on these parenting sites. We developed a generative statistical-mechanical narrative model to automatically extract the underlying stories and story fragments from millions of posts. We aggregated the stories into an overarching narrative framework graph. In our model, stories were represented as network graphs with actants as nodes and their various relationships as edges. We estimated the latent stories circulating on these sites by modeling the posts as a sampling of the hidden narrative framework graph. Temporal trends were examined based on monthly user-poststatistics. Results: We discovered that discussions of exemption from vaccination requirements are highly represented. We found a strong narrative framework related to exemption seeking and a culture of distrust of government and medical institutions. Various posts reinforced part of the narrative framework graph in which parents, medical professionals, and religious institutions emerged as key nodes, and exemption seeking emerged as an important edge. In the aggregate story, parents used religion or belief to acquire exemptions to protect their children from vaccines that are required by schools or government institutions, but (allegedly) cause adverse reactions such as autism, pain, compromised immunity, and even death. Although parents joined and left the discussion forums over time, discussions and stories about exemptions were persistent and robust to these membership changes. Conclusions: Analyzing parent forums about health care using an automated analytic approach, such as the one presented here, allows the detection of widespread narrative frameworks that structure and inform discussions. In most vaccination stories from the sites we analyzed, it is taken for granted that vaccines and not vaccine preventable diseases (VPDs) pose a threat to children. Because vaccines are seen as a threat, parents focus on sharing successful strategies for avoiding them, with exemption being the foremost among these strategies. When new parents join such sites, they may be exposed to this endemic narrative framework in the threads they read and to which they contribute, which may influence their health care decision making.

  • Smartphone LGBT theme. Image Source: Copyright: Lucato. Image purchased by authors under Standard license from

    Perceptions Toward a Smoking Cessation App Targeting LGBTQ+ Youth and Young Adults: A Qualitative Framework Analysis of Focus Groups


    Background: The prevalence of smoking among lesbian, gay, bisexual, trans, queer, and other sexual minority (LGBTQ+) youth and young adults (YYA) is significantly higher compared with that among non-LGBTQ+ persons. However, in the past, interventions were primarily group cessation classes that targeted LGBTQ+ persons of all ages. mHealth interventions offer an alternate and modern intervention platform for this subpopulation and may be of particular interest for young LGBTQ+ persons. Objective: This study explored LGBTQ+ YYA (the potential users’) perceptions of a culturally tailored mobile app for smoking cessation. Specifically, we sought to understand what LGBTQ+ YYA like and dislike about this potential cessation tool, along with how such interventions could be improved. Methods: We conducted 24 focus groups with 204 LGBTQ+ YYA (aged 16-29 years) in Toronto and Ottawa, Canada. Participants reflected on how an app might support LGBTQ+ persons with smoking cessation. Participants indicated their feelings, likes and dislikes, concerns, and additional ideas for culturally tailored smoking cessation apps. Framework analysis was used to code transcripts and identify the overarching themes. Results: Study findings suggested that LGBTQ+ YYA were eager about using culturally tailored mobile apps for smoking cessation. Accessibility, monitoring and tracking, connecting with community members, tailoring, connecting with social networks, and personalization were key reasons that were valued for a mobile app cessation program. However, concerns were raised about individual privacy and that not all individuals had access to a mobile phone, users might lose interest quickly, an app would need to be marketed effectively, and app users might cheat and lie about progress to themselves. Participants highlighted that the addition of distractions, rewards, notifications, and Web-based and print versions of the app would be extremely useful to mitigate some of their concerns. Conclusions: This study provided insight into the perspectives of LGBTQ+ YYA on a smoking cessation intervention delivered through a mobile app. The findings suggested a number of components of a mobile app that were valued and those that were concerning, as well as suggestions on how to make a mobile app cessation program successful. App development for this subpopulation should take into consideration the opinions of the intended users and involve them in the development and evaluation of mobile-based smoking cessation programs.

  • StoryPRIME. Image sourced and copyright owned by authors.

    Efficacy of Web-Based Collection of Strength-Based Testimonials for Text Message Extension of Youth Suicide Prevention Program: Randomized Controlled Experiment


    Background: Equipping members of a target population to deliver effective public health messaging to peers is an established approach in health promotion. The Sources of Strength program has demonstrated the promise of this approach for “upstream” youth suicide prevention. Text messaging is a well-established medium for promoting behavior change and is the dominant communication medium for youth. In order for peer ‘opinion leader’ programs like Sources of Strength to use scalable, wide-reaching media such as text messaging to spread peer-to-peer messages, they need techniques for assisting peer opinion leaders in creating effective testimonials to engage peers and match program goals. We developed a Web interface, called Stories of Personal Resilience in Managing Emotions (StoryPRIME), which helps peer opinion leaders write effective, short-form messages that can be delivered to the target population in youth suicide prevention program like Sources of Strength. Objective: To determine the efficacy of StoryPRIME, a Web-based interface for remotely eliciting high school peer leaders, and helping them produce high-quality, personal testimonials for use in a text messaging extension of an evidence-based, peer-led suicide prevention program. Methods: In a double-blind randomized controlled experiment, 36 high school students wrote testimonials with or without eliciting from the StoryPRIME interface. The interface was created in the context of Sources of Strength–an evidence-based youth suicide prevention program–and 24 ninth graders rated these testimonials on relatability, usefulness/relevance, intrigue, and likability. Results: Testimonials written with the StoryPRIME interface were rated as more relatable, useful/relevant, intriguing, and likable than testimonials written without StoryPRIME, P=.054. Conclusions: StoryPRIME is a promising way to elicit high-quality, personal testimonials from youth for prevention programs that draw on members of a target population to spread public health messages.

  • Drug Trends. Image created and copyright owned by authors.

    “When ‘Bad’ is ‘Good’”: Identifying Personal Communication and Sentiment in Drug-Related Tweets


    Background: To harness the full potential of social media for epidemiological surveillance of drug abuse trends, the field needs a greater level of automation in processing and analyzing social media content. Objectives: The objective of the study is to describe the development of supervised machine-learning techniques for the eDrugTrends platform to automatically classify tweets by type/source of communication (personal, official/media, retail) and sentiment (positive, negative, neutral) expressed in cannabis- and synthetic cannabinoid–related tweets. Methods: Tweets were collected using Twitter streaming Application Programming Interface and filtered through the eDrugTrends platform using keywords related to cannabis, marijuana edibles, marijuana concentrates, and synthetic cannabinoids. After creating coding rules and assessing intercoder reliability, a manually labeled data set (N=4000) was developed by coding several batches of randomly selected subsets of tweets extracted from the pool of 15,623,869 collected by eDrugTrends (May-November 2015). Out of 4000 tweets, 25% (1000/4000) were used to build source classifiers and 75% (3000/4000) were used for sentiment classifiers. Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machines (SVM) were used to train the classifiers. Source classification (n=1000) tested Approach 1 that used short URLs, and Approach 2 where URLs were expanded and included into the bag-of-words analysis. For sentiment classification, Approach 1 used all tweets, regardless of their source/type (n=3000), while Approach 2 applied sentiment classification to personal communication tweets only (2633/3000, 88%). Multiclass and binary classification tasks were examined, and machine-learning sentiment classifier performance was compared with Valence Aware Dictionary for sEntiment Reasoning (VADER), a lexicon and rule-based method. The performance of each classifier was assessed using 5-fold cross validation that calculated average F-scores. One-tailed t test was used to determine if differences in F-scores were statistically significant. Results: In multiclass source classification, the use of expanded URLs did not contribute to significant improvement in classifier performance (0.7972 vs 0.8102 for SVM, P=.19). In binary classification, the identification of all source categories improved significantly when unshortened URLs were used, with personal communication tweets benefiting the most (0.8736 vs 0.8200, P<.001). In multiclass sentiment classification Approach 1, SVM (0.6723) performed similarly to NB (0.6683) and LR (0.6703). In Approach 2, SVM (0.7062) did not differ from NB (0.6980, P=.13) or LR (F=0.6931, P=.05), but it was over 40% more accurate than VADER (F=0.5030, P<.001). In multiclass task, improvements in sentiment classification (Approach 2 vs Approach 1) did not reach statistical significance (eg, SVM: 0.7062 vs 0.6723, P=.052). In binary sentiment classification (positive vs negative), Approach 2 (focus on personal communication tweets only) improved classification results, compared with Approach 1, for LR (0.8752 vs 0.8516, P=.04) and SVM (0.8800 vs 0.8557, P=.045). Conclusions: The study provides an example of the use of supervised machine learning methods to categorize cannabis- and synthetic cannabinoid–related tweets with fairly high accuracy. Use of these content analysis tools along with geographic identification capabilities developed by the eDrugTrends platform will provide powerful methods for tracking regional changes in user opinions related to cannabis and synthetic cannabinoids use over time and across different regions.

  • Image Source: Author: KROMKRATHOG. License: Standard Licence with attribution.

    Evaluating Google, Twitter, and Wikipedia as Tools for Influenza Surveillance Using Bayesian Change Point Analysis: A Comparative Analysis


    Background: Traditional influenza surveillance relies on influenza-like illness (ILI) syndrome that is reported by health care providers. It primarily captures individuals who seek medical care and misses those who do not. Recently, Web-based data sources have been studied for application to public health surveillance, as there is a growing number of people who search, post, and tweet about their illnesses before seeking medical care. Existing research has shown some promise of using data from Google, Twitter, and Wikipedia to complement traditional surveillance for ILI. However, past studies have evaluated these Web-based sources individually or dually without comparing all 3 of them, and it would be beneficial to know which of the Web-based sources performs best in order to be considered to complement traditional methods. Objective: The objective of this study is to comparatively analyze Google, Twitter, and Wikipedia by examining which best corresponds with Centers for Disease Control and Prevention (CDC) ILI data. It was hypothesized that Wikipedia will best correspond with CDC ILI data as previous research found it to be least influenced by high media coverage in comparison with Google and Twitter. Methods: Publicly available, deidentified data were collected from the CDC, Google Flu Trends, HealthTweets, and Wikipedia for the 2012-2015 influenza seasons. Bayesian change point analysis was used to detect seasonal changes, or change points, in each of the data sources. Change points in Google, Twitter, and Wikipedia that occurred during the exact week, 1 preceding week, or 1 week after the CDC’s change points were compared with the CDC data as the gold standard. All analyses were conducted using the R package “bcp” version 4.0.0 in RStudio version 0.99.484 (RStudio Inc). In addition, sensitivity and positive predictive values (PPV) were calculated for Google, Twitter, and Wikipedia. Results: During the 2012-2015 influenza seasons, a high sensitivity of 92% was found for Google, whereas the PPV for Google was 85%. A low sensitivity of 50% was calculated for Twitter; a low PPV of 43% was found for Twitter also. Wikipedia had the lowest sensitivity of 33% and lowest PPV of 40%. Conclusions: Of the 3 Web-based sources, Google had the best combination of sensitivity and PPV in detecting Bayesian change points in influenza-related data streams. Findings demonstrated that change points in Google, Twitter, and Wikipedia data occasionally aligned well with change points captured in CDC ILI data, yet these sources did not detect all changes in CDC data and should be further studied and developed.

  • Flyers for recruitment in the NutriNet-Santé study. Image sourced and copyright owned by authors.

    Lessons Learned From Methodological Validation Research in E-Epidemiology


    Background: Traditional epidemiological research methods exhibit limitations leading to high logistics, human, and financial burden. The continued development of innovative digital tools has the potential to overcome many of the existing methodological issues. Nonetheless, Web-based studies remain relatively uncommon, partly due to persistent concerns about validity and generalizability. Objective: The objective of this viewpoint is to summarize findings from methodological studies carried out in the NutriNet-Santé study, a French Web-based cohort study. Methods: On the basis of the previous findings from the NutriNet-Santé e-cohort (>150,000 participants are currently included), we synthesized e-epidemiological knowledge on sample representativeness, advantageous recruitment strategies, and data quality. Results: Overall, the reported findings support the usefulness of Web-based studies in overcoming common methodological deficiencies in epidemiological research, in particular with regard to data quality (eg, the concordance for body mass index [BMI] classification was 93%), reduced social desirability bias, and access to a wide range of participant profiles, including the hard-to-reach subgroups such as young (12.30% [15,118/122,912], <25 years) and old people (6.60% [8112/122,912], ≥65 years), unemployed or homemaker (12.60% [15,487/122,912]), and low educated (38.50% [47,312/122,912]) people. However, some selection bias remained (78.00% (95,871/122,912) of the participants were women, and 61.50% (75,590/122,912) had postsecondary education), which is an inherent aspect of cohort study inclusion; other specific types of bias may also have occurred. Conclusions: Given the rapidly growing access to the Internet across social strata, the recruitment of participants with diverse socioeconomic profiles and health risk exposures was highly feasible. Continued efforts concerning the identification of specific biases in e-cohorts and the collection of comprehensive and valid data are still needed. This summary of methodological findings from the NutriNet-Santé cohort may help researchers in the development of the next generation of high-quality Web-based epidemiological studies.

  • Food and physical activity dictionaries. Image sourced and copyright owned by authors Quynh et al.

    Building a National Neighborhood Dataset From Geotagged Twitter Data for Indicators of Happiness, Diet, and Physical Activity


    Background: Studies suggest that where people live, play, and work can influence health and well-being. However, the dearth of neighborhood data, especially data that is timely and consistent across geographies, hinders understanding of the effects of neighborhoods on health. Social media data represents a possible new data resource for neighborhood research. Objective: The aim of this study was to build, from geotagged Twitter data, a national neighborhood database with area-level indicators of well-being and health behaviors. Methods: We utilized Twitter’s streaming application programming interface to continuously collect a random 1% subset of publicly available geolocated tweets for 1 year (April 2015 to March 2016). We collected 80 million geotagged tweets from 603,363 unique Twitter users across the contiguous United States. We validated our machine learning algorithms for constructing indicators of happiness, food, and physical activity by comparing predicted values to those generated by human labelers. Geotagged tweets were spatially mapped to the 2010 census tract and zip code areas they fall within, which enabled further assessment of the associations between Twitter-derived neighborhood variables and neighborhood demographic, economic, business, and health characteristics. Results: Machine labeled and manually labeled tweets had a high level of accuracy: 78% for happiness, 83% for food, and 85% for physical activity for dichotomized labels with the F scores 0.54, 0.86, and 0.90, respectively. About 20% of tweets were classified as happy. Relatively few terms (less than 25) were necessary to characterize the majority of tweets on food and physical activity. Data from over 70,000 census tracts from the United States suggest that census tract factors like percentage African American and economic disadvantage were associated with lower census tract happiness. Urbanicity was related to higher frequency of fast food tweets. Greater numbers of fast food restaurants predicted higher frequency of fast food mentions. Surprisingly, fitness centers and nature parks were only modestly associated with higher frequency of physical activity tweets. Greater state-level happiness, positivity toward physical activity, and positivity toward healthy foods, assessed via tweets, were associated with lower all-cause mortality and prevalence of chronic conditions such as obesity and diabetes and lower physical inactivity and smoking, controlling for state median income, median age, and percentage white non-Hispanic. Conclusions: Machine learning algorithms can be built with relatively high accuracy to characterize sentiment, food, and physical activity mentions on social media. Such data can be utilized to construct neighborhood indicators consistently and cost effectively. Access to neighborhood data, in turn, can be leveraged to better understand neighborhood effects and address social determinants of health. We found that neighborhoods with social and economic disadvantage, high urbanicity, and more fast food restaurants may exhibit lower happiness and fewer healthy behaviors.

  • A study participant in rural western India undergoing screening for Atrial Fibrillation using a mobile device. Image sourced and copyright owned by authors.

    High Burden of Unrecognized Atrial Fibrillation in Rural India: An Innovative Community-Based Cross-Sectional Screening Program


    Background: Atrial fibrillation, the world’s most common arrhythmia, is a leading risk factor for stroke, a disease striking nearly 1.6 million Indians annually. Early detection and management of atrial fibrillation is a promising opportunity to prevent stroke but widespread screening programs in limited resource settings using conventional methods is difficult and costly. Objective: The objective of this study is to screen people for atrial fibrillation in rural western India using a US Food and Drug Administration-approved single-lead electrocardiography device, Alivecor. Methods: Residents from 6 villages in Anand District, Gujarat, India, comprised the base population. After obtaining informed consent, a team of trained research coordinators and community health workers enrolled a total of 354 participants aged 50 years and older and screened them at their residences using Alivecor for 2 minutes on 5 consecutive days over a period of 6 weeks beginning June, 2015. Results: Almost two-thirds of study participants were 55 years or older, nearly half were female, one-third did not receive any formal education, and more than one-half were from households earning less than US $2 per day. Twelve participants screened positive for atrial fibrillation yielding a sample prevalence of 5.1% (95% CI 2.7-8.7). Only one participant had persistent atrial fibrillation throughout all of the screenings, and 9 screened positive only once. Conclusions: Our study suggests a prevalence of atrial fibrillation in this Indian region (5.1%) that is markedly higher than has been previously reported in India and similar to the prevalence estimates reported in studies of persons from North America and Europe. Historically low reported burden of atrial fibrillation among individuals from low and middle-income countries may be due to a lack of routine screening. Mobile technologies may help overcome resource limitations for atrial fibrillation screening in underserved and low-resource settings.

  • Online search. Image Source: Author: ChristianHoppe. License: CC0 Public Domain.

    Disease Monitoring and Health Campaign Evaluation Using Google Search Activities for HIV and AIDS, Stroke, Colorectal Cancer, and Marijuana Use in Canada: A...


    Background: Infodemiology can offer practical and feasible health research applications through the practice of studying information available on the Web. Google Trends provides publicly accessible information regarding search behaviors in a population, which may be studied and used for health campaign evaluation and disease monitoring. Additional studies examining the use and effectiveness of Google Trends for these purposes remain warranted. Objective: The objective of our study was to explore the use of infodemiology in the context of health campaign evaluation and chronic disease monitoring. It was hypothesized that following a launch of a campaign, there would be an increase in information seeking behavior on the Web. Second, increasing and decreasing disease patterns in a population would be associated with search activity patterns. This study examined 4 different diseases: human immunodeficiency virus (HIV) infection, stroke, colorectal cancer, and marijuana use. Methods: Using Google Trends, relative search volume data were collected throughout the period of February 2004 to January 2015. Campaign information and disease statistics were obtained from governmental publications. Search activity trends were graphed and assessed with disease trends and the campaign interval. Pearson product correlation statistics and joinpoint methodology analyses were used to determine significance. Results: Disease patterns and online activity across all 4 diseases were significantly correlated: HIV infection (r=.36, P<.001), stroke (r=.40, P<.001), colorectal cancer (r= −.41, P<.001), and substance use (r=.64, P<.001). Visual inspection and the joinpoint analysis showed significant correlations for the campaigns on colorectal cancer and marijuana use in stimulating search activity. No significant correlations were observed for the campaigns on stroke and HIV regarding search activity. Conclusions: The use of infoveillance shows promise as an alternative and inexpensive solution to disease surveillance and health campaign evaluation. Further research is needed to understand Google Trends as a valid and reliable tool for health research.

  • Automating data analytics. Image created and copyright owned by authors.

    IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics


    Background: We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. Objective: To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. Methods: The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Results: Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence intervals, or a confusion matrix. Conclusions: IBMWA is a new alternative for data analytics software that automates descriptive, predictive, and visual analytics. This program is very user-friendly but requires data preprocessing, statistical conceptual understanding, and domain expertise.

Citing this Article

Right click to copy or hit: ctrl+c (cmd+c on mac)

Latest Submissions Open for Peer-Review:

View All Open Peer Review Articles
  • Effect of viewing smoking scenes in motion pictures on audiences' subsequent smoking desire in South Korea

    Date Submitted: Dec 2, 2016

    Open Peer Review Period: Dec 2, 2016 - Dec 16, 2016

    Background: Even though movies constitute a medium transmitted and distributed worldwide, smoking scenes in movies are relatively free from public health monitoring. The effect of smoking scenes in mo...

    Background: Even though movies constitute a medium transmitted and distributed worldwide, smoking scenes in movies are relatively free from public health monitoring. The effect of smoking scenes in movies in promoting viewers’ smoking desire remains unknown. Objective: We explored whether exposure of adolescent smokers to images of smoking in fılms could stimulate smoking behavior. Methods: Data used for this study were derived from a survey of respondents using a nationally representative online sample of Korean high school students (N=748). Participants who were aged 16-18 years were randomly assigned to watch three short video clips with or without smoking scenes. After adjusting covariates using propensity score matching, we conducted paired sample t-test and logistic regression to compare the difference in smoking desire before and after exposure of participants to smoking scenes. Results: In the case of male adolescents, cigarette craving was significantly higher in the experimental group with exposure to smoking scenes than that in the control group without exposure to smoking scenes (t = 2.066, p < 0.05). After adjusting covariates, more impulsive adolescents had significantly higher cigarette cravings (aOR = 3.40, 95% CI: 1.40 - 8.23). However, the group who actively sought health information had considerably lower cigarette cravings than the group who did not engage in such information-seeking (aOR = 0.08, 95% CI: 0.01 - 0.88). Conclusions: Smoking scenes in motion pictures can initiate adolescent smoking behavior. Therefore, establishing a standard that restricts the frequency of smoking scenes in films and assigning smoking-related screening grade to films are warranted. Clinical Trial: N/A

  • Using Facebook Advertisements to Collect Data from Cannabis Users

    Date Submitted: Nov 28, 2016

    Open Peer Review Period: Dec 1, 2016 - Dec 15, 2016

    Background: The legal landscape surrounding cannabis use is changing quickly. In turn, alternative administration methods like vaporizers and edibles are becoming increasingly popular. This state of f...

    Background: The legal landscape surrounding cannabis use is changing quickly. In turn, alternative administration methods like vaporizers and edibles are becoming increasingly popular. This state of flux requires quick and flexible research methods to perform repeated rapid surveillance that can inform regulatory and public health policies. Social media services like Facebook with associated targeted advertisement platforms can serve as the next generation of behavioral health and epidemiological research tools by providing researchers with access to convenient and inexpensive survey data from a diverse and representative global population. Objective: This study provides an illustration of how Facebook advertisements can be used to expeditiously collect survey data from cannabis users in a cost effective manner. Further, we describe the basic characteristics of cannabis users recruited with Facebook advertisements, discussing sampling strategy issues. Methods: Facebook advertisements were distributed to Americans 18 years of age and older who endorsed pro-cannabis or related interests on Facebook. The advertisements promoted a web link to an online survey on cannabis use. Two types of advertisement campaigns were conducted: one with no demographic filters other than a minimum age requirement and another that partially restricted delivery to racial minority participants to encourage sufficient representation. Results: Advertisements were shown to 168,894 people within a larger population of approximately 21,000,000, resulting in 3,892 clicks to the survey. The final sample size, N = 2,932, included only those that passed a data quality check, resulting in a cost of $0.27 per participant who completed the survey. Users reported initiating cannabis use at M = 15.7 years old, with a large proportion of participants (40%) currently using cannabis daily. Response rates were distributed across states at rates consistent with population levels from the 2014 US census. Restricting advertisements to minorities was an effective strategy for oversampling, increasing representation by 18.9%. Conclusions: Diverse and representative samples of cannabis users can be efficiently recruited with Facebook advertisements. Social media platforms ameliorate time and geographical constraints allowing researchers to survey collect data from thousands of respondents in a short time frame on a modest budget, allowing for rapid and repeated surveillance. Future work is needed to investigate the nuances of this sampling strategy to delineate best methods that ensure representativeness.

  • Body Weight Misperception and Dissatisfaction Among Overweight and Obese Adult Nigerians

    Date Submitted: Nov 25, 2016

    Open Peer Review Period: Nov 28, 2016 - Dec 12, 2016

    Background: ABSTRACT: The increase in the prevalence of overweight and obesity in low/medium income countries has negative impact on the overall health of the populace as well as acting as socioeconom...

    Background: ABSTRACT: The increase in the prevalence of overweight and obesity in low/medium income countries has negative impact on the overall health of the populace as well as acting as socioeconomic and health burden. Correct perception of one’s body weight is a step in seeking healthy help towards weight reduction in overweight/obese individuals. This study was carried out to assess the body weight misperception and dissatisfaction among overweight and obese adults in an urban African setting. Designs: This study was a part of larger cross-sectional study that was designed to plan an intervention for overweight and obese adults in a urban African setting. For this study, only overweight and obese adults who consented to participate. Objective: This study was carried out to assess the body weight misperception and dissatisfaction among overweight and obese adults in an urban African setting. Methods: This study was a part of larger cross-sectional study that was designed to plan an intervention for overweight and obese adults in a urban African setting. For this study, only overweight and obese adults who consented to participate in the study were randomly selected from 15 enumeration areas in Alimosho Local Government area of Lagos State, Nigeria. The WHO guidelines for conducting community survey protocols were employed in recruiting the overweight/obese participants. Body weight perception and dissatisfaction were assessed through two questions: how do you describe your weight? I feel bad about my weight. Results: More than half (53.62%) of the participants misperceived their weight as either underweight or normal weight of which 61.2% were females. The strength of agreement between the actual BMI and weight perception was very poor (Kappa= 0.032, SE=0.015, p=0.037). The strongest predictor of weight perception was gender (male) with odds ratio of 1.63 (CI=1.13-2.35). About 15.7% of the participants were dissatisfied with their weight of which 83.1% were males. Age (young adult) was a predictor of weight dissatisfaction with odds ratio of 2.37 (CI=1.62-3.46). Conclusions: More than half of the participants misperceived their body weight as either underweight or normal weight and majority of them were females. More males were not happy with their body weight and participant within the young adult age group were more dissatisfied with their body weight. Clinical Trial: LREC/10/06/261

  • Evaluating the factors associated with patients’ discussion with their physician about the risks of prescription opioid use in Maryland

    Date Submitted: Nov 13, 2016

    Open Peer Review Period: Nov 25, 2016 - Dec 9, 2016

    Background: Opioid abuse and misuse is a major public health concern. Although opioid use is appropriate at the beginning, the quantity and duration of prescription leads to misuse among diverse patie...

    Background: Opioid abuse and misuse is a major public health concern. Although opioid use is appropriate at the beginning, the quantity and duration of prescription leads to misuse among diverse patients sub-groups. Primary care remains at the forefront of chronic pain management and is the largest group of prescribers. Therefore, in the face of rising prescriptions in the last few years, the communication between the healthcare provider and patient about risk of opioids is critical for reducing misuse. Objective: This study describes if the patients in Maryland know about the risks associated with prescription opioid (PO) misuse from their physicians and discusses potential rescue plan. Methods: Data was collected from the Maryland Public Opinion Survey (MPOS), a web-based survey administered to patients over 24 jurisdictions in Maryland. We utilized Facebook to recruit our study population. Our question of interest was, “Have you ever had a talk with your doctor about the risks of taking prescription opioids?” We studied the association between the demographic characteristics of the respondents to the above question and their response using chi-square and multivariable logistic regression model. Results: Of 6623 responders to the MPOS, n=3259 responded to the question about discussing PO risks with their providers. The responder’s gender, race and their neighborhood in Maryland, were not associated with their propensity to discuss PO risks with providers. Patients who were significantly more likely to discuss PO risks with provider were, those who have ever used PO without doctor’s permission (OR=1.49, CI (1.24, 1.79)) and heroin (OR=2.21; CI (1.68, 2.91)), and not finished a college education (OR=1.2; CI (1.01, 2.78)). Conclusions: There exists a major gap among patient-provider communication as patients with a prior history of drug misuse or abuse only were more likely to discuss PO risks with their provider. Therefore, effective provider communication and educational approaches with concurrent evaluation would be essential to the intervention framework designed to reduce PO misuse and abuse.

  • Effect of mobile phone text messages reminders on uptake of routine immunization in Pakistan- A Randomized Controlled Clinical Trial

    Date Submitted: Nov 21, 2016

    Open Peer Review Period: Nov 24, 2016 - Dec 8, 2016

    Background: Improved routine immunization (RI) coverage is recommended as the priority public health strategy to decrease vaccine-preventable diseases and eradicate polio in Pakistan and worldwide. Ob...

    Background: Improved routine immunization (RI) coverage is recommended as the priority public health strategy to decrease vaccine-preventable diseases and eradicate polio in Pakistan and worldwide. Objective: We aimed to ascertain whether customized automated one-way short message service (SMS) reminders to the caregivers delivered via mobile phones could improve routine immunization coverage in Pakistan. Methods: This was a randomized controlled trial, conducted in an urban squatter settlement area of Karachi, Pakistan. Three hundred infants less than two weeks of age were enrolled and participants were randomized to the intervention (standard care + one way SMS reminder) or control (standard care) groups. The primary outcome was to compare the proportion of children immunized up to date at 18 weeks of age. Results: The participation rate was 84% (300/ 356); 94% of the participants had a working mobile phone and out of this 99%showed willingness to receive text reminders for immunization. Only 6% of the participants in intervention arm reported not receiving SMS. Children in the intervention arm who received SMS reminder had a non-significant higher percentage of vaccine visit completion at all three scheduled visits. Visit 1 at 6 weeks (76% versus 71%, p=0.36); visit 2 at 10 weeks (59% versus 53%, p=0.30) and visit 3 at 14 weeks (31% versus 26%, p=0.31). Conclusions: Automated simple one-way SMS reminders in local languages might be feasible for improving routine vaccination coverage. We did not find a statistically significant difference for higher immunization coverage in the intervention arm. Whether SMS reminders alone can alter parental attitudes and behavior needs to be evaluated by better-powered studies and comparing different types and content of text messages in LMICs settings. Clinical Trial: Trial Registration Number: Clinical Trial.Gov, registration number was NCT01859546. Registered 14th May 2013

  • The effectiveness of Facebook’s advertising channel: A case for communicating the dangers of alcohol during pregnancy in New Zealand

    Date Submitted: Nov 21, 2016

    Open Peer Review Period: Nov 24, 2016 - Dec 8, 2016

    Background: Social media is gaining recognition for communicating public health messages. One area attracting attention is Facebook’s advertising channel. This channel has a wide reach and user enga...

    Background: Social media is gaining recognition for communicating public health messages. One area attracting attention is Facebook’s advertising channel. This channel has a wide reach and user engagement with disseminated campaign materials is impressive . However, to date, there are no study undertaken that has examined the effectiveness of the communication. Objective: The aim of this study was to investigate how effective Facebook’s advertising channel is as a mode for communicating public health messages. Methods: This study investigated a New Zealand public health campaign called Don’t Know? Don’t Drink, which warned against drinking alcohol during pregnancy. The campaign conveyed the warning through a video and three banner ads delivered as newsfeeds to women aged 18–30 years. This current study examined user engagement for the video and banner ads based on metadata provided by Facebook. The comments generated by the campaign materials were analysed using text mining. The relationship between the themes identified and the message was investigated using predictive modelling and sentiment analysis. Results: The user engagement was impressive with the video receiving 203,754 views and the combined Likes and Shares for the promotional materials amounting to 6125 and 300 respectively. Thematic analysis performed on the comments ( n=819) using text mining identified four themes. Logistic regression showed that two of the themes (Risk of Pregnancy and Alcohol and Culture) exhibited predictability (probability of 0.69). The sentimental analysis carried out on the two themes revealed that 72% of the comments (negative and neutral comments) did not evoke a favourable response. Conclusions: The user engagement observed in this study was consistent with previous research. The comment-based evaluation revealed the message was not accepted by a vast majority of the women who commented. Negative comments could provide further opportunities to engage with these women. However, the one-way communication format used by Facebook’s advertising channel prevents this from happening. Further investigation is warranted to confirm whether reciprocating to clarify or provide additional information could produce a different outcome. Until such an investigation occurs, this study cautions against using a one-way communication format to convey public health messages via Facebook’s advertising channel. Clinical Trial: N/A