Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?


Citing this Article

Right click to copy or hit: ctrl+c (cmd+c on mac)

Published on 01.04.20 in Vol 6, No 2 (2020): Apr-Jun

Preprints (earlier versions) of this paper are available at, first published Jun 05, 2019.

This paper is in the following e-collection/theme issue:

    Original Paper

    Classification of Health-Related Social Media Posts: Evaluation of Post Content–Classifier Models and Analysis of User Demographics

    Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, United States

    Corresponding Author:

    Ryan Rivas, BS

    Department of Computer Science and Engineering

    University of California, Riverside

    363 Winston Chung Hall

    900 University Ave

    Riverside, CA, 92521

    United States

    Phone: 1 951 827 2838



    Background: The increasing volume of health-related social media activity, where users connect, collaborate, and engage, has increased the significance of analyzing how people use health-related social media.

    Objective: The aim of this study was to classify the content (eg, posts that share experiences and seek support) of users who write health-related social media posts and study the effect of user demographics on post content.

    Methods: We analyzed two different types of health-related social media: (1) health-related online forums—WebMD and DailyStrength—and (2) general online social networks—Twitter and Google+. We identified several categories of post content and built classifiers to automatically detect these categories. These classifiers were used to study the distribution of categories for various demographic groups.

    Results: We achieved an accuracy of at least 84% and a balanced accuracy of at least 0.81 for half of the post content categories in our experiments. In addition, 70.04% (4741/6769) of posts by male WebMD users asked for advice, and male users’ WebMD posts were more likely to ask for medical advice than female users’ posts. The majority of posts on DailyStrength shared experiences, regardless of the gender, age group, or location of their authors. Furthermore, health-related posts on Twitter and Google+ were used to share experiences less frequently than posts on WebMD and DailyStrength.

    Conclusions: We studied and analyzed the content of health-related social media posts. Our results can guide health advocates and researchers to better target patient populations based on the application type. Given a research question or an outreach goal, our results can be used to choose the best online forums to answer the question or disseminate a message.

    JMIR Public Health Surveill 2020;6(2):e14952





    There is a huge amount of knowledge waiting to be extracted in health-related online social networks and forums, which we collectively refer to as social media. Health-related social media store the interactions of users who are interested in health-related topics [1]. These users share their experiences, share information of friends and family, or seek help for a wide range of health issues [1]. In the United States, more than 60 million Americans have read or collaborated in health 2.0 resources [2]. In addition, 40% of Americans have doubted a professional opinion when it conflicted with the opinions expressed in health-related social media [2]. Health-related social media widen access to health information for the public, regardless of individuals’ race, age, locality, or education [1].

    In this study, we evaluated the content of posts in various health-related social media. We analyzed two types of health-related social media: (1) health-related online forums: WebMD and DailyStrength and (2) general social networks: Google+ and Twitter. This was a 4-step process comprising data collection, identifying post content categories, performing classification experiments, and performing a demographics analysis. We first collected large datasets of posts from each source and identified several categories. Afterward, we identified meaningful categories from randomly selected posts from each source. In our classification experiments, we labeled data from each source and trained classifiers to identify post content categories. Finally, we used classifiers trained on our labeled data to identify categories in the remaining data and analyzed how often posts in these categories are made by various demographic groups.

    The goal of this study was to provide researchers with information and tools to support further research. For example, researchers looking for clinical trial participants can use DailyStrength, where users often share experiences about a particular condition, and health advocates seeking to spread awareness about a condition that affects men can use WebMD, where men often ask for advice. To this end, we also made comparisons between platforms to suggest where such a researcher might begin looking. The classifier models built in this study can assist with this task as well as other analyses involving health-related online postings.

    Related Work

    Analysis of Health-Related Social Media

    Many studies have been performed to characterize health-related social media communities. Hackworth and Kunz [3] reported that 80% of Americans have searched the internet for health-related information, more than 60 million Americans are consumers of social networks in the Web 2.0 environment (health 2.0), and consumers, especially those with chronic conditions, are leading the health 2.0 movement by seeking clinical knowledge and emotional support. Wiley et al [4] studied the impact of different characteristics of various social media forums on drug-related content and demonstrated that the characteristics of a social media platform affect several aspects of discussion. Eichstaedt et al [5] predicted the county-level heart disease mortality by capturing the psychological characteristics of local communities through expressed text in Twitter. However, these studies do not describe or compare specific demographics in terms of their post content.

    Further work has focused on categorizing health-related posts based on their content. Yu et al [6] performed a preliminary content analysis of D/deaf and hard of hearing discussion forum, AllDeaf, to observe different types of social support behaviors and identify social support features for a future text classification task. Reavley and Pilkington [7] analyzed the content of tweets related to depression and schizophrenia, finding that tweets about depression mostly discussed consumer resources and advertisements, whereas tweets about schizophrenia mostly raised awareness and reported research findings. Lee et al [8] analyzed the content of tweets from health-related Twitter users, finding that they tweet about testable claims and personal experiences. Lopes and Da Silva [9] collected posts from a health-related online forum, MedHelp, and used them to propose and refine a scheme for manually classifying health-related forum posts into 4 categories and a total of 23 subcategories. Our work was built upon these studies by defining our own categories of post content, some of which have analogues in these studies.

    Health-Related Demographic Analysis

    Other work has compared health issues between demographics or examined the demographics within a population participating in health-related research. Krueger et al [10] studied the mortality attributable to a low education level in the United States across several demographics, where they found people with an education level below a high school degree to have a higher mortality rate. Anderson-Bill et al [11] examined the demographics and behavioral and psychosocial characteristics of Web-health users (adults who use the Web to find information on health behavior and behavior change) recruited for a Web-based nutrition, physical activity, and weight gain prevention intervention. Their results suggest that users participating in online health interventions are likely “middle-aged, well-educated, upper middle-class women whose detrimental health behaviors put them at risk of obesity, heart disease, some cancers, and diabetes” [11]. These studies describe the demographics of the populations in their studies but do not describe the demographics of health-related social media users.

    Previous work has focused on characterizing demographics on health-related social media. Sadah et al [12] analyzed the demographics of health-related social media and found that users of drug review websites and health-related online forums are predominantly women, health-related social media users are generally older than general social media users, black users are underrepresented in health-related social media, users in areas with better access to health care participate more in health-related social media, and the writing level of health-related social media users is lower than the reading level of the general population. Sadah et al [13] also performed a demographic-based content analysis of health-related social media posts to extract top distinctive terms, top drugs and disorders, sentiment, and emotion, finding that the most popular topic varied by demographic, for example, pregnancy was popular with female users, whereas cardiac problems, HIV, and back pain were the most discussed topics by male users. They also found that users with a higher writing level were less likely to express anger in their posts. We expanded upon this work by characterizing and comparing the demographics of health-related social media websites in terms of the frequency of post content categories.

    Text Classification in Social Media

    Text classification is frequently employed by researchers to gain insights into social media users and trends, both in and out of health-related settings. Sadilek et al [14] studied the spread of infectious diseases by analyzing Twitter data using a support vector machine (SVM) model. Huh et al [15] developed a naïve Bayes model to help WebMD moderators find posts they would likely respond to. Nikfarjam et al [16] proposed a machine learning–based tagger to extract adverse drug reactions from health-related social media. Mislove et al [17] estimated the gender and ethnicity of Twitter users using the reported first name and last name. Sadah et al [12] expanded upon the work of Mislove et al [17] by considering screen names in estimating gender. In this study, we used text classification techniques to identify categories of post content in health-related social media and used the techniques proposed in the studies by Sadah et al [12] and Mislove et al [17] to study the frequency of these categories within several demographics.



    For health-related online forums, we selected 2 different websites, WebMD and DailyStrength. The reason for selecting 2 health-related online forums is to cover the different types of health-related online forums that they each represent. Although WebMD consists of multiple health communities where people ask questions and get responses from the community members [18], DailyStrength enables patients to exchange experiences and treatments, discuss daily struggles and successes, and receive emotional support [19]. For each post collected from these websites, we extracted the URL, title, author’s username, post time, the body of the post, and the name of the message board. For each user of a collected post, we also collected the author’s age, friends, gender, and location, where applicable. As crawling of these sites has been performed at different times, some of the data we have collected do not reflect the current availability of certain attributes because of website format changes, for example, age and gender are currently available from WebMD user profiles but were not available before. In this study, the selection of demographic attributes we used for a source is based on the availability reflected by the majority of posts collected from that source, for example, most of the WebMD posts in our data were collected before age and gender were available, thus we did not use these attributes for an analysis of WebMD user demographics. We restricted the posts used from these sources to the first post in each thread. In our analysis, we used the post body, post title, message board name, and username from WebMD and the post body, post title, message board name, and user’s gender, age, and location from DailyStrength.

    For general social networks, we chose Twitter and Google+ as they offer interfaces to easily collect their data (in contrast to Facebook). For each Twitter post, we collected the post content, post time, location, and the author’s username and location. For each Google+ post we collected the title, post time, update time, the post content, the location, and the author’s username, first and last names, age, gender, and location. As Twitter and Google+ are general social networks, we used 274 representative health-related keywords to filter them as follows: (1) Drugs: from the most prescriptions dispensed from RxList [20], we selected the 200 most popular drugs. By removing the variants of the same drug (eg, different milligram dosages), the final list of drugs contained 124 unique drug names. (2) Hashtags: 11 popular health-related Twitter hashtags, such as #BCSM (Breast Cancer and Social Media). (3) Disorders: 81 frequently discussed disorders, such as AIDS and asthma. (4) Pharmaceuticals: the names of the 12 largest pharmaceutical companies, such as Novartis. (5) Insurance: the names of the 44 biggest insurance companies, such as Aetna and Shield. (6) General health-related keywords “healthcare” and “health insurance.” To reach the final keyword counts for hashtags, disorders, pharmaceuticals, and insurance, we sampled each keyword from a larger list for each of these categories and kept keywords with a high ratio of health-related posts. In our analysis, we used the tweet body, user’s first and last name, and user’s location from Twitter and post body, post title, and user’s gender, age, first and last name, and location from Google+.

    To filter Twitter with the health-related keyword list to retrieve relevant tweets for TwitterHealth, we used the Twitter streaming application programming interface (API) [21]. Similarly, we used Google+ API [22] to extract the relevant posts for Google+Health. For health-related online forums WebMD and DailyStrength, we built a crawler for each website in Java using jsoup [23], a library to extract and parse HTML content. Table 1 lists for each source the number of posts collected, the date ranges of collected posts, and whether the demographic attributes used in this study are present, and Table 2 lists the distribution of demographics for each source across each demographic attribute. For all 4 of these sources, we did not specifically focus our search on English-language posts aside from using English drug names; however, the majority of posts collected from these sources were in the English language.

    Table 1. List of all sources used with their number of posts, date range of posts, and the available demographic attributes.
    View this table
    Table 2. Demographics of users from each source.
    View this table

    Identifying Post Contents

    From each source, we randomly selected 500 posts. We then manually identified the different categories of shared content for each type of health-related social media. As shown in Table 3, we identified 9 different categories. The first 4 categories were identified for both types of health-related social media (hence, all 4 sources). Of these first 4 categories, 3 were also identified by Lopes and Da Silva [9], for example, share experiences, which we defined as posts in which a user shared a personal experience related to a health-related topic. This is similar to their sharing personal experiences category, except that we did not restrict our definition to experiences shared in response to another post. About family has no equivalent in their scheme, but it can be covered by other categories that they have defined, for example, by asking a specific question about or expressing sadness over a family member’s illness. Our share experiences category was also similar to categories in other work, for example, the personal experience of mental illness category in the study by Reavley and Pilkington [7], the personal category from Lee et al [8], the personal event category from Robillard et al [28], and the first-hand experience category from Alvaro et al [29]. As Twitter and Google+ are more news-based social media, we identified 5 additional categories from these sources. Educational material can be considered equivalent to the teaching category defined by Lopes and Da Silva [9]. Despite the differences between the categories we defined and those proposed by Lopes and Da Silva [9], we believed that our categories are sufficient for a proof of concept for automatic post content category classification in the two types of health-related social media that we investigated. It should be noted that the identification of specific experiences is outside the scope of this study; the share experiences category is a catch-all for any experiences shared in a health-related post from any source.

    We asked 3 graduate students to label the selected data from WebMD, Twitter, and Google+; we used a majority vote as the final result for each of these sources. Table 4 lists the intercoder agreement as given by a Krippendorff alpha for our labeled datasets from WebMD, Twitter, and Google+. The selected DailyStrength data were labeled by the labeler with the highest agreement with the majority averaged over each category from the other 3 sources (average alpha=.680). As shown in Table 5, the distribution of categories in each source is different, for example, the share experiences category is more common in health-related online forums (WebMD and DailyStrength).

    Table 3. List of all identified categories for health-related online forums and general social networks.
    View this table
    Table 4. Intercoder agreement for our labeled datasets (Krippendorff alpha).
    View this table
    Table 5. Percentages of categories in each source from the labeled data (N=500).
    View this table

    Bot Filtering

    We examined the impact of automated accounts (ie, bots) on our study using OSoMe’s Botometer (formerly BotOrNot, Indiana University) [30], a tool that estimates how likely a Twitter account is to be a bot. We used the Botometer API to score each account that has a tweet in our initial sample of 500. The API assigned each of the 345 accounts that were still active a score in the range 0 to 1, with higher scores corresponding to a higher likelihood of an automated account. We manually evaluated each account with a score above 0.5. With this threshold, which was chosen because it is a natural choice that avoids possible bias from a more arbitrary choice of threshold value, we found a total of 33 likely bot accounts. We found that tweets from these accounts make up a substantial portion of the categories share news (11 tweets), advertisement (12 tweets), and educational material (10 tweets). As Botometer’s API rate limit makes removing all bot tweets from our Twitter corpus of over 11 million tweets unfeasible, we instead randomly selected 1000 posts from each day in the date range of our Twitter data. For each author of these selected posts, we again used Botometer to evaluate the likelihood of an automated account, removing tweets from accounts with a score above 0.5 for a total of 142,411 tweets used in our analysis.

    We also manually examined 100 posts each from WebMD and DailyStrength to determine the prevalence of bots on these websites, which consisted of one of the authors reading each of these posts and determining whether or not it appeared to be posted by a spambot. In the context of online forums, a spambot is an automated agent that posts promotional content [31]. By this criterion, none of the posts examined appeared to have been posted by a bot. Although this does not guarantee that there are no posts from bots in the data from these websites used in our study, it does suggest that posts from bots may be much less prevalent in these sources, likely because of the smaller volume of posts and more active moderation compared with Twitter and Google+.

    Building Post Content Classifiers

    For each category, we performed binary classification experiments with three classifier algorithms: random forest [32], linear SVM [33], and convolutional neural network (CNN) [34]. We first extracted and concatenated the features shown in Table 6. These features include the title of a post, the main text of a post (body), and the name of the message board that contains the post (board name). For the random forest and SVM classifiers, we converted the features to a term frequency-inverse document frequency vector with stop words removed and the remaining words lemmatized. For the CNN classifier, we converted the features to sets of fastText [35] vectors pretrained on Wikipedia. For all classifiers, we applied class weights to the training data such that the weight of the positive class (the post is in the category) is balanced with the weight of the negative class (the post is not in the category). These weights are used with random forest and SVM according to their implementations by Pedregosa et al [36], whereas CNN uses oversampling of the least frequent class as recommended by Buda et al [37].

    To build the classifiers, we excluded the categories where the percentage is less than 10.0% (50/500), and for the rest, we first split the labeled data to two datasets as follows: (1) a training dataset (450 posts) and (2) a test dataset (50 posts), held out for a final test after training is complete. Afterward, for each classifier algorithm, we trained each classifier by varying the hyperparameters shown in Table 7, considering each combination of hyperparameter values. For all combinations, we performed a 5-fold cross-validation on the training dataset to select the combination of hyperparameter values with the highest balanced accuracy [38]. Finally, we used these hyperparameter values to create a model trained on the full training dataset and tested this model on the test dataset that was held out before the cross-validation experiments. Note that we did not use a nested cross-validation, as our goal in these experiments was to find a single combination of hyperparameter values that we could use to apply a sufficiently accurate classifier model to the rest of our data.

    Table 8 shows the classifiers’ accuracy for WebMD, DailyStrength, Twitter, and Google+. We have shown only the classifiers for categories that have more than 10% of labeled data.

    For the remainder of our analysis, we only considered source-category combinations with a classifier that achieved a balanced accuracy higher than 0.75.

    For the source-category combinations that did not have a classifier that achieved a balanced accuracy of at least 0.75, we performed another round of experiments in which we attempted to classify posts using the best-performing classifier trained on a corresponding category from another source, for example, random forest for share experiences from WebMD. In these experiments, we used 500 posts from one source for training and 500 posts from another source for testing and again finding the best combination of hyperparameters via a 5-fold cross-validation of the training data. Table 9 shows the results of these experiments. Classifiers trained on the DailyStrength and Twitter data achieved a balanced accuracy of over 0.75 on the share experiences category from Google+, so we added this category to the set of categories considered for further analysis. For each category in this set, we used the model with the highest balanced accuracy for that category to label the rest of the data. We reported our findings on the frequency of these categories by several demographics according to their respective classifiers in the Results section.

    Table 6. All classifiers’ training features.
    View this table
    Table 7. Classifier hyperparameter values evaluated in our experiments.
    View this table
    Table 8. Classifier results for each category (N=50).
    View this table
    Table 9. Results of classifiers trained on a corresponding category from another source (N=500).
    View this table

    Demographic Analysis

    We chose four demographic attributes as shown in Table 1: gender, age, ethnicity, and location. Where possible, we extracted these attributes from user profiles. These attributes are not available for every source, so we used existing classifier models where available to estimate their values. Specifically, we used the classifiers from Mislove et al [17] to estimate gender for Twitter users and ethnicity for both Twitter and Google+ users. To estimate gender for WebMD users, we used the classifier from Sadah et al [12], an extension of the classifier by Mislove et al that considers a user’s screen name when the user’s first name is not present. These classifiers use the 1000 most popular male and female birth names reported by the US Social Security Administration for each year from 1935 to 1995 as ground truth for gender and the distribution of ethnicities for each last name as reported by the 2000 US Census as ground truth for ethnicity. For each of these attributes, we used the data labeled by our post content category classifiers to determine how frequently users of each demographic write a post with one of these categories, for example, the percentage of posts made by male users in which a user shared his experiences. When comparing these percentages, we calculated statistical significance via a Pearson chi-square test. Note that a post can be in more than one category, for example, a post can both share experiences and ask for medical advice.

    Top Distinctive Message Boards

    For each combination of demographic and category (eg, male and share experiences) analyzed in WebMD and DailyStrength, we found the most distinctive message boards for that combination. For WebMD, we considered only boards that have at least 0.01% of posts for a given combination, or 30 if 0.01% is less than 30. Owing to the large number of message boards on DailyStrength (1608 analyzed in this study), we reduced this restriction to only consider boards with at least 30 posts for a given combination. We then determined distinctiveness by calculating the relative difference of each board. On the basis of the calculation for top distinctive terms by Sadah et al [13], we calculated the relative difference of board b within the combination of category c and demographic b of demographic attribute a as shown in equation (1):

    RelDifcd(b)=[Freqcd(b)−AvgFreqca(b)]/AvgFreqca(b) (1),

    where Freqcd(b) is the normalized frequency of posts on board b in category c by a user in demographic d, for example, the number of posts on the WebMD Breast Cancer message board that share experiences and were written by a female user divided by the number of posts on WebMD that share experiences and were written by a female user. AvgFreqca(b) is the average Freqcd(b) across all demographics d within the demographic attribute a, for example, male and female for the demographic attribute, gender.



    In this section, we presented the categories’ results by each demographic where possible. For age demographics, we organized users into five groups: 0 to 17 years, 18 to 34 years, 35 to 44 years, 46 to 64 years, and older than 65 years. For ethnicity, we considered four possibilities: Asian, black, Hispanic, and white. For location, we considered the four regions designated by the US Census Bureau: Midwest, Northeast, South, and West. As explained in the Methods section, we considered the following categories for each source: (1) WebMD: share experiences, ask for advice, psychological support, and about family; (2) DailyStrength: share experiences and ask for advice; (3) TwitterHealth: share experiences and share news; and (4) Google+Health: share experiences and educational material.


    As shown in Table 1, our WebMD dataset includes gender predicted by the gender classifier from Sadah et al [12]. Therefore, we have reported the distribution of gender among its categories. Table 10 shows the frequency of posts made by male and female users for each category. We found that 70.04% (4741/6769) of posts written by male WebMD users asked for advice, compared with 45.14% (6372/14,117) of posts by female users (P<.001). Table 11 shows the top 10 most distinctive WebMD message boards by the number of posts for each combination of gender and category. Unsurprisingly, these results show that female users were more likely to post on boards about pregnancy and parenting than males in all categories, whereas male users were more likely to discuss men’s health issues. Men also gave psychological support and discussed family members on the message board for the infertility drug, Clomid, more frequently than women.

    Table 10. WebMD category frequency by gender.
    View this table
    Table 11. Top 10 most distinctive WebMD message boards for male and female users in each category.
    View this table


    For our DailyStrength demographic attributes, gender, age, and location, we reported the results for the categories share experiences and ask for advice. Table 12 shows the category frequencies for each demographic. The majority of posts (over 80%) from every demographic share experiences; but among the different age demographics, we saw a clear decline in frequency as age increases, from 92.77% (6175/6656) for users aged younger than 18 years to 81.82% (24,420/29,847) for users 65 years and older (P<.001). The frequency of posts that ask for advice is similar for almost every demographic (30%-40%), with the exception of posts from users younger than 18 years 25.45% (1694/6656). P<.001 for all comparisons between users younger than 18 years and other age groups.

    Tables 13-15 show the top 10 most distinctive DailyStrength message boards by the number of posts for each combination of gender and category, age group and category, and location and category, respectively. From these lists, we saw a wider variety of topics compared with WebMD, likely because of the large number of message boards on DailyStrength. However, we still saw some trends when considering broader topics. Male users tend to share experiences on message boards related to personal and social issues. Both male and female users asked for advice most frequently on boards related to physical conditions.

    We also observed a general tendency for younger users (aged younger than 45 years) to share experiences on message boards about personal and social issues, whereas older users favored message boards for general support and discussion. Users in all age groups frequently asked for advice about physical conditions. We found no clear trend in sharing experiences when evaluating census regions, but we saw that users from the Northeast region share experiences about physical and psychological conditions, whereas users from the West region often shared experiences on message boards for general support and discussion. Users from all regions frequently asked for advice about physical conditions except the West, whose users tended to ask for advice on message boards for general support and discussion. Note that there are fewer than 10 message boards listed for users of age 0 to 17 years who asked for advice in Table 14 because of the lack of message boards that also met our restriction of having at least 30 of these posts.

    Table 12. DailyStrength category frequency by gender, age, and location.
    View this table
    Table 13. Top 10 most distinctive DailyStrength message boards for male and female users in each category.
    View this table

    Table 14. Top 10 most distinctive DailyStrength message boards for each age group in each category.
    View this table
    Table 15. Top 10 most distinctive DailyStrength message boards for each region in each category.
    View this table


    For our Twitter demographic attributes, gender, ethnicity, and location, with gender and ethnicity predicted by the classifier from Mislove et al [17], we reported the results for categories share experiences and share news using our sample of 142,411 tweets in Table 16. As described in the Methods section, this dataset was created from our full corpus by first sampling 1000 posts for each day represented in the dataset and then pruning tweets from likely bot accounts. All demographics analyzed shared experiences more often than they shared news. Hispanic users had the largest difference, with 29.16% (826/2833) of them shared experiences versus 5.47% (155/2833) of them shared news (P<.001). Users from the Northeast census region had the smallest difference, with 20.38% (1093/5362) of them shared experiences versus 10.16% (545/5362) of them shared news; P<.001. Where comparison is possible between these demographics and their counterparts in WebMD and DailyStrength, we saw that Twitter users shared experiences less frequently (P<.001 for all such comparisons).

    Table 16. Twitter category frequency by gender, ethnicity, and location.
    View this table

    We also performed this analysis on our full Twitter dataset of 11,637,888 tweets. We compared these results with the results shown in Table 16 and found that the differences were generally not statistically significant (with statistical significance defined as P<.05) for the share experiences category but were significant for all but one demographic in the share news category. These findings agree with our evaluation of bot likelihood using our initial sample of 500 tweets, where we found that the share news category had a substantial number of tweets from likely bot accounts, but the share experiences category did not. The P values of these comparisons are shown in Table 17.

    Table 17. P values of comparisons between Twitter results using pruned data and results using all data.
    View this table


    Our Google+ demographic attributes include gender, age, ethnicity, and location, with ethnicity predicted by the classifier from Mislove et al [17], and for these attributes we reported the results from the share experiences and educational material categories in Table 18. As classifiers trained on our labeled Google+ dataset did not achieve a sufficiently high balanced accuracy for the share experiences category, we considered classifiers trained on the labeled DailyStrength and Twitter data as described in the Methods section. The full set of Google+ posts were classified as 34.13% (63,709/186,666) share experiences by the DailyStrength-trained classifier and 18.83% (35,149/186,666) share experiences by the Twitter-trained classifier. As the latter distribution of the share experiences category is closer to the distribution reported in Table 5, 13.0% (65/500), we used the Twitter-trained classifier for the remainder of our analysis in the share experiences category.

    From these results, we saw that most demographics appeared to share experiences more frequently than the set of all Google+ users. This is likely the effect of a bias toward users who chose to report these attributes (or a real name, in the case of ethnicity). When comparing how often a demographic shares experiences with how often posts from users with no data on that demographic’s corresponding attribute share experiences (eg, posts from men vs posts from users who did not report gender), we found that P<.001 for all such comparisons except for users aged ≥65 years (P=.83). Where comparison is possible between these demographics and their counterparts in WebMD and DailyStrength, we saw that Google+ users shared experiences less frequently (P<.001 for all such comparisons).

    Educational material was shared less frequently by users aged between 35 and 44 years, 14.9% (46/308) than by users of any other age group. In particular, they shared educational material much less frequently than both the previous age group, 18 to 34 years, 25.5% (141/552), P<.001; and the following age group, 45 to 64 years, 34.3% (171/499), P<.001. Asian Google+ users, 35.75% (1010/2825), substantially shared more educational material than users of any other ethnicity (P=.002 vs black users, P<.001 vs Hispanic users, and P<.001 vs white users).

    Table 18. Google+ category frequency by gender, age, ethnicity, and location.
    View this table


    Principal Findings

    Our analysis shows several interesting results. From our initial samples, we found that health-related posts from general social networks often shared news and educational material, and posts on health-related online forums frequently shared experiences, asked for medical advice, and requested or gave psychological support (Table 5). Our evaluation of three classification algorithms on the post content categories described by our study showed that, in terms of balanced accuracy, SVM tended to perform well on WebMD, whereas CNN performed better on DailyStrength data. Of the 2 Twitter categories used in our experiments, share experiences and share news, SVM performed the best in share experiences and CNN was the best in share news. None of the classifiers we evaluated performed particularly well when trained with the Google+ data; only the CNN classifier was able to meet our performance threshold in the Google+ educational material category. However, in the share experiences category, classifiers trained on the DailyStrength and Twitter data were able to meet our performance threshold in the Google+ share experiences category, suggesting that at least some transferability is possible with classifiers trained on other datasets.

    A further analysis of our health-related online forum data showed distinct differences between users of WebMD and DailyStrength. On WebMD, we found that the majority of posts made by male users and almost half of all posts made by female users asked for advice. This would seem to contradict an earlier study that found that women were the predominant users of the internet for health advice [39], but when considering the overall number of posts from male and female WebMD users included in our study (41,422 posts by men vs 93,293 by women), we saw that posts asking for advice were still more likely to be written by a woman than a man. DailyStrength users shared experiences frequently in all demographics analyzed in our study, even more so than WebMD users; however, asking for advice was less common than on WebMD. These differences may be explained by the differences in the 2 health-related online forums; although DailyStrength offers support groups for a variety of topics, WebMD communities are often frequented by experts who can provide advice to users.

    An analysis of health-related posts on general social networks, Twitter and Google+, suggested differences that they have from health-related online forums. Compared with WebMD and DailyStrength, sharing experiences, which identifies posts in which a user shared a personal experience related to a health-related topic, is far less frequent in posts from Twitter and Google+ that contain one or more of the health-related keywords used in this study. The relatively low frequency of sharing experiences in our sample of several health-related topics on general social networks compared with the frequency of sharing experiences on health-related online forums may be due to a variety of factors, such as Twitter’s lack of health-related communities because of its structure as well as WebMD’s and DailyStrength’s focus on answering medical questions and providing support, respectively. Some subsets of health-related tweets studied in other work have low proportions of sharing experiences similar to our observations, such as tweets about depression [7], schizophrenia [7], and dementia [28], as well as tweets from health-related Twitter users [8]. However, other work has shown that the proportion can be much higher, such as in tweets about dental pain [40] and prescription drug use [29]. Many health-related topics had high proportions of posts that shared experiences in our Google+ data, for example, headache, 93.22% (6572/7050); migraine, 78.77% (2029/2576); insomnia, 71.41% (2430/3403); cold sore, 58.0% (370/638); and diazepam, 51.1% (95/186). This suggests that the proportion of sharing experiences in health-related posts may be highly dependent on the topic or topics studied; thus, our findings on the share experiences category may not generalize to other studies on health-related social media posts.

    Our comparison of results between our stratified sample of Twitter data with tweets from suspected bots removed and our full Twitter dataset showed that automated accounts had a significant impact on the share news category. Other work has also shown that bots can have an effect on health-related Twitter conversations, particularly on the subject of vaccination. Bots post both pro- and antivaccine tweets [41] and retweet vaccine-related tweets at higher frequencies than human users [42]. The use of bots in this manner amplifies the debate and further polarizes the communities involved. It is clear that bot activity must be considered when analyzing health-related conversations on Twitter.

    The differences in how often educational material is shared on Google+ between the demographics we studied highlight potential targets for informational health care campaigns. A health care campaign is a health care–related broad nationally or subnationally driven, led, or coordinated activity [43]. Users in the age demographic of 35 to 44 years, who share educational material less often than other age groups, may benefit from being provided with medical information that they are not aware of. Demographics that share educational material more frequently than others, such as Asian Google+ users, may also be of interest to medical experts. If a further analysis of the educational material shared by these groups shows that the information is inaccurate or misleading, providing correct information may benefit them.

    Our results provide useful information that can help health care providers to reach the right demographic group. For example, researchers looking for clinical trial participants can use health-related online forums, where many posts are about sharing experiences. Moreover, demographic-specific results can help guide the targeted educational campaigns. As an example, male WebMD users ask specific medical advice questions more often than females, so male WebMD users may be more receptive to a campaign offering advice from medical experts.

    The classifier models used in this study can also be useful for researchers who want to study posts that contain the categories we studied. For example, a researcher who wants to study experiences about a particular drug can use these classifiers to find posts that share experiences from a larger dataset of posts that mention that drug. As another example, a researcher who wants to find out which disorders are frequently mentioned among users who share news can use a classifier to gather a dataset of news-sharing posts. In general, we provided researchers with tools that enable them to answer hypotheses and do research on the subject of health-related social media posts. These tools are provided by the description of our methodology, which describes how one might build these classifier models, and by trained classifier models that are available on request. Similar tools may also be applicable to the categories in the scheme proposed by Lopes and Da Silva [9]. We leave this as future work.


    As users of health-related social media use an informal writing style, our selected 274 words to filter Twitter and Google+ as described in the Methods section may not cover all health-related posts or their variability in topics. For example, the abbreviation IUI (intrauterine insemination) is widely used in health-related posts but not included in the health-related keyword list. Another limitation is the different uses of terms used to filter Twitter and Google+. For example, the word “cancer” yields many tweets that talk about zodiac signs.

    We found that some Twitter categories have a high proportion of tweets from automated accounts. Although we have attempted to filter out tweets from such accounts, some such tweets may still exist in the data used in our analysis, and tweets from legitimate accounts may have been filtered out. Our initial evaluation of bot prevalence also found that the educational material category had a high proportion of tweets from bots. This may be also true of that category in the Google+ data, which was not filtered for bots; thus, those results may not accurately represent the demographics studied.

    Our demographic populations may not be fully representative of all users from the sources in our study. As shown in Table 1, some of our demographics were estimated using classifiers, and these estimates are not always correct. Other demographics in our study are optionally reported by users. This introduces a bias toward users who choose to report their age, gender, and/or location, as noted in our results from Google+. We also assumed these reported demographics are correct for each such user.


    In this study, we analyzed the content shared in two different types of health-related social media: health-related online forums and general social networks. For the two types of health-related social media, we manually identified 4 post categories: share experiences, ask for specific medical advice, request or give psychological support, and about family; and we additionally identified 5 categories for general social networks: share news, jokes, advertisements, personal opinion, and educational material. After labeling randomly selected data for each source, we built classifiers for each category. Finally, we made demographic-based content analyses where possible.


    This project was partially supported by the National Science Foundation grant numbers IIS-1619463, IIS-1746031, IIS-1838222, and IIS-1901379. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation.

    Authors' Contributions

    RR conducted the experiments and analysis and wrote the manuscript. SS conducted earlier versions of the experiments and analysis and assisted in the writing of the manuscript. YG coordinated the labeling of the training datasets and conducted preliminary research. VH conceived the study and provided coordination and guidance in the experiments and writing of the manuscript.

    Conflicts of Interest

    None declared.


    1. Moorhead SA, Hazlett DE, Harrison L, Carroll JK, Irwin A, Hoving C. A new dimension of health care: systematic review of the uses, benefits, and limitations of social media for health communication. J Med Internet Res 2013 Apr 23;15(4):e85 [FREE Full text] [CrossRef] [Medline]
    2. Kane GC, Fichman RG, Gallaugher J, Glaser J. Community relations 2.0. Harv Bus Rev 2009 Nov;87(11):45-50, 132. [Medline]
    3. Hackworth BA, Kunz MB. Health care and social media: building relationships via social networks. Acad Health Care Manag J 2011;7(2):1-14 [FREE Full text]
    4. Wiley MT, Jin C, Hristidis V, Esterling KM. Pharmaceutical drugs chatter on Online Social Networks. J Biomed Inform 2014 Jun;49:245-254 [FREE Full text] [CrossRef] [Medline]
    5. Eichstaedt JC, Schwartz HA, Kern ML, Park G, Labarthe DR, Merchant RM, et al. Psychological language on Twitter predicts county-level heart disease mortality. Psychol Sci 2015 Feb;26(2):159-169 [FREE Full text] [CrossRef] [Medline]
    6. Yu B, Gerido L, He Z. Exploring text classification of social support in online health communities for people who are D/deaf and hard of hearing. Proc Assoc Info Sci Tech 2017;54(1):840-841. [CrossRef]
    7. Reavley NJ, Pilkington PD. Use of Twitter to monitor attitudes toward depression and schizophrenia: an exploratory study. PeerJ 2014;2:e647 [FREE Full text] [CrossRef] [Medline]
    8. Lee JL, DeCamp M, Dredze M, Chisolm MS, Berger ZD. What are health-related users tweeting? A qualitative content analysis of health-related users and their messages on twitter. J Med Internet Res 2014 Oct 15;16(10):e237 [FREE Full text] [CrossRef] [Medline]
    9. Lopes CT, da Silva BG. A classification scheme for analyses of messages exchanged in online health forums. Inf Res 2019;24(1) [FREE Full text]
    10. Krueger PM, Tran MK, Hummer RA, Chang VW. Mortality Attributable to Low Levels of Education in the United States. PLoS One 2015;10(7):e0131809 [FREE Full text] [CrossRef] [Medline]
    11. Anderson-Bill ES, Winett RA, Wojcik JR. Social cognitive determinants of nutrition and physical activity among web-health users enrolling in an online intervention: the influence of social support, self-efficacy, outcome expectations, and self-regulation. J Med Internet Res 2011 Mar 17;13(1):e28 [FREE Full text] [CrossRef] [Medline]
    12. Sadah SA, Shahbazi M, Wiley MT, Hristidis V. A study of the demographics of Web-based health-related social media users. J Med Internet Res 2015 Aug 6;17(8):e194 [FREE Full text] [CrossRef] [Medline]
    13. Sadah SA, Shahbazi M, Wiley MT, Hristidis V. Demographic-based content analysis of web-based health-related social media. J Med Internet Res 2016 Jun 13;18(6):e148 [FREE Full text] [CrossRef] [Medline]
    14. Sadilek A, Kautz H, Silenzio V. Modeling Spread of Disease From Social Interactions. In: Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media. 2012 Presented at: ICWSM'12; June 4-7, 2012; Dublin, Ireland   URL:
    15. Huh J, Yetisgen-Yildiz M, Pratt W. Text classification for assisting moderators in online health communities. J Biomed Inform 2013 Dec;46(6):998-1005 [FREE Full text] [CrossRef] [Medline]
    16. Nikfarjam A, Sarker A, O'Connor K, Ginn R, Gonzalez G. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. J Am Med Inform Assoc 2015 May;22(3):671-681 [FREE Full text] [CrossRef] [Medline]
    17. Mislove A, Lehmann S, Ahn YY, Onnela J, Rosenquist JN. Understanding the Demographics of Twitter Users. In: Proceedings of the 5th international AAAI conference on weblogs and social media. 2011 Presented at: ICWSM'11; July 17-21, 2011; Barcelona, Spain   URL:
    18. Kanthawala S, Vermeesch A, Given B, Huh J. Answers to health questions: internet search results versus online health community responses. J Med Internet Res 2016 Apr 28;18(4):e95 [FREE Full text] [CrossRef] [Medline]
    19. Bissoyi S, Mishra BK, Patra MR. Recommender Systems in a Patient Centric Social Network - A Survey. In: Proceedings of the 2016 International Conference on Signal Processing, Communication, Power and Embedded System. 2016 Presented at: SCOPES'16; October 3-5, 2016; Paralakhemundi, India p. 386-389. [CrossRef]
    20. RxList - The Internet Drug Index for Prescription Drug Information, Interactions, and Side Effects.   URL: [accessed 2015-02-02] [WebCite Cache]
    21. Twitter Developer.   URL: [accessed 2015-06-14] [WebCite Cache]
    22. Google+ API | Google+ Platform for Web. | Google Developers   URL: [accessed 2015-02-02] [WebCite Cache]
    23. Hedley J. jsoup Java HTML Parser, with best of DOM, CSS, and jquery.   URL: [accessed 2015-02-02] [WebCite Cache]
    24. Twitter.   URL: [accessed 2016-05-19] [WebCite Cache]
    25. Google Plus.   URL: [accessed 2016-05-19] [WebCite Cache]
    26. DailyStrength: Support Groups.   URL: [accessed 2015-02-02] [WebCite Cache]
    27. WebMD - Better Information. Better Health.   URL: [accessed 2015-02-02] [WebCite Cache]
    28. Robillard JM, Johnson TW, Hennessey C, Beattie BL, Illes J. Aging 2.0: health information about dementia on Twitter. PLoS One 2013;8(7):e69861 [FREE Full text] [CrossRef] [Medline]
    29. Alvaro N, Conway M, Doan S, Lofi C, Overington J, Collier N. Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use. J Biomed Inform 2015 Dec;58:280-287 [FREE Full text] [CrossRef] [Medline]
    30. Davis CA, Varol O, Ferrara E, Flammini A, Menczer F. BotOrNot: A System to Evaluate Social Bots. In: Proceedings of the 25th International Conference Companion on World Wide Web. 2016 Presented at: WWW'16 Companion; April 11-15, 2016; Montreal, Canada p. 273-274. [CrossRef]
    31. Hayati P, Chai K, Potdar V, Talevski A. Behaviour-Based Web Spambot Detection by Utilising Action Time and Action Frequency. In: Proceedings of the International Conference on Computational Science and Its Applications. 2010 Presented at: ICCSA'10; March 23-26, 2010; Fukuoka, Japan p. 351-360. [CrossRef]
    32. Breiman L. Random forests. Mach Learn 2011;45(1):5-32. [CrossRef]
    33. Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995;20(3):273-297. [CrossRef]
    34. Kim Y. Convolutional Neural Networks for Sentence Classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014 Presented at: EMNLP'14; October 25-29, 2014; Doha, Qatar p. 1746-1751. [CrossRef]
    35. Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Trans Assoc Comput Linguist 2017;5:135-146. [CrossRef]
    36. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011;12:2825-2830 [FREE Full text]
    37. Buda M, Maki A, Mazurowski MA. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 2018 Oct;106:249-259. [CrossRef] [Medline]
    38. Brodersen KH, Ong CS, Stephan KE, Buhmann JM. The Balanced Accuracy and Its Posterior Distribution. In: Proceedings of the 20th International Conference on Pattern Recognition. 2010 Presented at: ICPR'10; August 23-26, 2010; Istanbul, Turkey p. 3121-3124. [CrossRef]
    39. Sillence E, Briggs P, Harris P, Fishwick L. Going online for health advice: changes in usage and trust practices over the last five years. Interact Comput 2007;19(3):397-406 [FREE Full text] [CrossRef]
    40. Heaivilin N, Gerbert B, Page JE, Gibbs JL. Public health surveillance of dental pain via Twitter. J Dent Res 2011 Sep;90(9):1047-1051 [FREE Full text] [CrossRef] [Medline]
    41. Broniatowski DA, Jamison AM, Qi S, AlKulaib L, Chen T, Benton A, et al. Weaponized health communication: Twitter bots and Russian trolls amplify the vaccine debate. Am J Public Health 2018 Oct;108(10):1378-1384. [CrossRef] [Medline]
    42. Yuan X, Schuchard RJ, Crooks AT. Examining emergent communities and social bots within the polarized online vaccination debate in Twitter. Soc Media Soc 2019;5(3):205630511986546. [CrossRef]
    43. Mathai E, Allegranzi B, Kilpatrick C, Bagheri Nejad S, Graafmans W, Pittet D. Promoting hand hygiene in healthcare through national/subnational campaigns. J Hosp Infect 2011 Apr;77(4):294-298. [CrossRef] [Medline]


    API: application programming interface
    CNN: convolutional neural network
    SVM: support vector machine

    Edited by G Eysenbach; submitted 05.06.19; peer-reviewed by A Davoudi, JP Allem; comments to author 29.06.19; revised version received 06.08.19; accepted 27.01.20; published 01.04.20

    ©Ryan Rivas, Shouq A Sadah, Yuhang Guo, Vagelis Hristidis. Originally published in JMIR Public Health and Surveillance (, 01.04.2020.

    This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.