This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.
The increasing volume of health-related social media activity, where users connect, collaborate, and engage, has increased the significance of analyzing how people use health-related social media.
The aim of this study was to classify the content (eg, posts that share experiences and seek support) of users who write health-related social media posts and study the effect of user demographics on post content.
We analyzed two different types of health-related social media: (1) health-related online forums—WebMD and DailyStrength—and (2) general online social networks—Twitter and Google+. We identified several categories of post content and built classifiers to automatically detect these categories. These classifiers were used to study the distribution of categories for various demographic groups.
We achieved an accuracy of at least 84% and a balanced accuracy of at least 0.81 for half of the post content categories in our experiments. In addition, 70.04% (4741/6769) of posts by male WebMD users asked for advice, and male users’ WebMD posts were more likely to ask for medical advice than female users’ posts. The majority of posts on DailyStrength shared experiences, regardless of the gender, age group, or location of their authors. Furthermore, health-related posts on Twitter and Google+ were used to share experiences less frequently than posts on WebMD and DailyStrength.
We studied and analyzed the content of health-related social media posts. Our results can guide health advocates and researchers to better target patient populations based on the application type. Given a research question or an outreach goal, our results can be used to choose the best online forums to answer the question or disseminate a message.
There is a huge amount of knowledge waiting to be extracted in health-related online social networks and forums, which we collectively refer to as social media. Health-related social media store the interactions of users who are interested in health-related topics [
In this study, we evaluated the content of posts in various health-related social media. We analyzed two types of health-related social media: (1) health-related online forums: WebMD and DailyStrength and (2) general social networks: Google+ and Twitter. This was a 4-step process comprising data collection, identifying post content categories, performing classification experiments, and performing a demographics analysis. We first collected large datasets of posts from each source and identified several categories. Afterward, we identified meaningful categories from randomly selected posts from each source. In our classification experiments, we labeled data from each source and trained classifiers to identify post content categories. Finally, we used classifiers trained on our labeled data to identify categories in the remaining data and analyzed how often posts in these categories are made by various demographic groups.
The goal of this study was to provide researchers with information and tools to support further research. For example, researchers looking for clinical trial participants can use DailyStrength, where users often share experiences about a particular condition, and health advocates seeking to spread awareness about a condition that affects men can use WebMD, where men often ask for advice. To this end, we also made comparisons between platforms to suggest where such a researcher might begin looking. The classifier models built in this study can assist with this task as well as other analyses involving health-related online postings.
Many studies have been performed to characterize health-related social media communities. Hackworth and Kunz [
Further work has focused on categorizing health-related posts based on their content. Yu et al [
Other work has compared health issues between demographics or examined the demographics within a population participating in health-related research. Krueger et al [
Previous work has focused on characterizing demographics on health-related social media. Sadah et al [
Text classification is frequently employed by researchers to gain insights into social media users and trends, both in and out of health-related settings. Sadilek et al [
For health-related online forums, we selected 2 different websites, WebMD and DailyStrength. The reason for selecting 2 health-related online forums is to cover the different types of health-related online forums that they each represent. Although WebMD consists of multiple health communities where people ask questions and get responses from the community members [
For general social networks, we chose Twitter and Google+ as they offer interfaces to easily collect their data (in contrast to Facebook). For each Twitter post, we collected the post content, post time, location, and the author’s username and location. For each Google+ post we collected the title, post time, update time, the post content, the location, and the author’s username, first and last names, age, gender, and location. As Twitter and Google+ are general social networks, we used 274 representative health-related keywords to filter them as follows: (1) Drugs: from the most prescriptions dispensed from RxList [
To filter Twitter with the health-related keyword list to retrieve relevant tweets for TwitterHealth, we used the Twitter streaming application programming interface (API) [
List of all sources used with their number of posts, date range of posts, and the available demographic attributes.
Source | Number of posts | Date range | Gender | Age | Ethnicity | Location |
TwitterHealth [ |
11,637,888 | May 2, 2013 to November 11, 2013 | Gender classifier [ |
Noa | Ethnicity classifier [ |
Yesb |
Google+Health [ |
186,666 | August 24, 2009 to January 5, 2014 | Yes | Yes | Ethnicity classifier [ |
Yes |
DailyStrength [ |
1,319,622 | June 21, 2006 to December 3, 2017 | Yes | Yes | No | Yes |
WebMD [ |
318,297 | December 24, 2006 to May 11, 2019 | Gender classifier [ |
No | No | No |
aThe demographic attribute is not provided by the source and no classifier is used because of low accuracy.
bThe demographic attribute is provided by the source.
Demographics of users from each source.
Attribute and demographic | TwitterHealth, % | Google+Health, % | DailyStrength, n (%) | WebMD, n (%) | |
|
|||||
|
Male | 48.19a | 64.64a | 95,269 (17.26)b | 6769 (32.41)b |
|
Female | 51.81a | 35.36a | 456,600 (82.74)b | 14,117 (67.59)b |
|
|||||
|
0-17 | N/Ac | 3.42a | 6656 (1.33)b | N/A |
|
18-34 | N/A | 53.21a | 187,966 (37.55)b | N/A |
|
35-44 | N/A | 21.89a | 126,646 (25.30)b | N/A |
|
45-64 | N/A | 19.02a | 149,487 (29.86)b | N/A |
|
≥65 | N/A | 2.46a | 29,847 (5.96)b | N/A |
|
|||||
|
Asian | 3.24a | 5.60a | N/A | N/A |
|
Black | 0.30a | 0.30a | N/A | N/A |
|
Hispanic | 23.50a | 17.40a | N/A | N/A |
|
White | 73.00a | 76.60a | N/A | N/A |
|
|||||
|
Northeast | 165,531 (19.83)d | 2598 (17.86)d | 73,221 (19.58)b | N/A |
|
Midwest | 174,620 (20.92)d | 2393 (16.45)d | 84,302 (22.55)b | N/A |
|
South | 313,350 (37.53)d | 4863 (33.44)d | 123,556 (33.05)b | N/A |
|
West | 181,400 (21.73)d | 4690 (32.25)d | 92,809 (24.82)b | N/A |
aBased on Sadah et al [
bCalculated with user data collected or estimated from this study.
cN/A: not applicable.
dCalculated from user counts reported in the study by Sadah et al [
From each source, we randomly selected 500 posts. We then manually identified the different categories of shared content for each type of health-related social media. As shown in
We asked 3 graduate students to label the selected data from WebMD, Twitter, and Google+; we used a majority vote as the final result for each of these sources.
List of all identified categories for health-related online forums and general social networks.
Category | Health-related online forums | General social networks | Example |
Share experiences | Yes | Yes |
“I could not work after Tylenol.” “I have taken Lipitor every day.” |
Ask for specific medical advice or information | Yes | Yes |
“Is honey allowed for diabetics?” |
Request or give psychological support | Yes | Yes |
“I hope your diabetes is under control.” “We’re thinking of you.” |
About family (not about self) | Yes | Yes |
“My son is now nine months old and teething like crazy.” |
Share news | No | Yes |
“Kaiser Permanente Invites Software Developers To Build Apps—Forbes. http://feedly.com/k/Zojwq” |
Jokes | No | Yes |
“Got any jokes about Sodium Hypobromite? NaBro.” |
Advertisements | No | Yes |
“Check out these two vitamins for one recipe! http://bit.ly/1471dbn” |
Personal opinion | No | Yes |
“Main frustration of lupus is losing the ability to do things that used to be normal” |
Educational material | No | Yes |
“Side Effects of Alzheimer’s and Dementia Drugs http://bit.ly/cK7L1f” |
Intercoder agreement for our labeled datasets (Krippendorff alpha).
Category | WebMD | TwitterHealth | Google+Health |
Share experiences | 0.349 | 0.446 | 0.109 |
Ask for specific medical advice or information | 0.768 | 0.225 | 0.108 |
Request or give psychological support | 0.219 | 0.090 | −0.007 |
About family (not about self) | 0.736 | 0.322 | −0.010 |
Share news | N/Aa | 0.083 | 0.083 |
Jokes | N/A | 0.177 | 0.029 |
Advertisement | N/A | 0.220 | 0.107 |
Personal opinion | N/A | 0.103 | 0.038 |
Educational material | N/A | 0.164 | 0.091 |
aN/A: not applicable.
Percentages of categories in each source from the labeled data (N=500).
Category | WebMD, n (%) | DailyStrength, n (%) | TwitterHealth, n (%) | Google+Health, n (%) |
Share experiences | 236 (47.2) | 400 (80.0) | 74 (14.8) | 65 (13.0) |
Ask for specific medical advice or information | 270 (54.0) | 173 (34.6) | 3 (0.6) | 10 (2.0) |
Request or give psychological support | 126 (25.2) | 247 (49.4) | 9 (1.8) | 7 (1.4) |
About family (not about self) | 68 (13.6) | 37 (7.4) | 5 (1.0) | 34 (6.8) |
Share news | N/Aa | N/A | 56 (11.2) | 145 (28.9) |
Jokes | N/A | N/A | 38 (7.6) | 33 (6.6) |
Advertisement | N/A | N/A | 26 (5.2) | 70 (14.0) |
Personal opinion | N/A | N/A | 35 (7.0) | 84 (16.8) |
Educational material | N/A | N/A | 36 (7.2) | 137 (25.7) |
aN/A: not applicable.
We examined the impact of automated accounts (ie,
We also manually examined 100 posts each from WebMD and DailyStrength to determine the prevalence of bots on these websites, which consisted of one of the authors reading each of these posts and determining whether or not it appeared to be posted by a spambot. In the context of online forums, a spambot is an automated agent that posts promotional content [
For each category, we performed binary classification experiments with three classifier algorithms: random forest [
To build the classifiers, we excluded the categories where the percentage is less than 10.0% (50/500), and for the rest, we first split the labeled data to two datasets as follows: (1) a training dataset (450 posts) and (2) a test dataset (50 posts), held out for a final test after training is complete. Afterward, for each classifier algorithm, we trained each classifier by varying the hyperparameters shown in
For the remainder of our analysis, we only considered source-category combinations with a classifier that achieved a balanced accuracy higher than 0.75.
For the source-category combinations that did not have a classifier that achieved a balanced accuracy of at least 0.75, we performed another round of experiments in which we attempted to classify posts using the best-performing classifier trained on a corresponding category from another source, for example, random forest for share experiences from WebMD. In these experiments, we used 500 posts from one source for training and 500 posts from another source for testing and again finding the best combination of hyperparameters via a 5-fold cross-validation of the training data.
All classifiers’ training features.
Source | Extracted features |
WebMD | Title, body, and board name |
DailyStrength | Title, body, and board name |
Google+ | Title and body |
Body |
Classifier hyperparameter values evaluated in our experiments.
Classifier and hyperparameter | Values | |
|
||
|
Maximum tree depth | 2, 4, 8, 16, 32, 64 |
|
Number of trees, n | 10, 100, 1000 |
|
||
|
C | 0.001, 0.01, 0.1, 1, 10 |
|
Loss function | Hinge, squared hinge |
|
||
|
Filter window sizes | (2, 3, 4), (3, 4, 5), (4, 5, 6) |
|
Feature maps per filter window size, n | 100, 200, 300, 400, 500, 600 |
Classifier results for each category (N=50).
Source and category | Random forest | Support vector machine | Convolutional neural network | ||||
|
Accuracy, n (%) | Balanced accuracy | Accuracy, n (%) | Balanced accuracy | Accuracy, n (%) | Balanced accuracy | |
|
|||||||
|
Share experiencesa | 41 (82) | 0.83b | 41 (82) | 0.81 | 41 (82) | 0.82 |
|
Ask for specific medical advice or informationa | 40 (80) | 0.82 | 41 (82) | 0.83b | 37 (74) | 0.76 |
|
Request or give psychological supporta | 39 (78) | 0.71 | 43 (86) | 0.8 b | 38 (76) | 0.68 |
|
About Family (Not about self)a | 38 (76) | 0.56 | 40 (80) | 0.89b | 47 (94) | 0.81 |
|
|||||||
|
Share experiencesa | 41 (82) | 0.80 | 40 (80) | 0.70 | 41 (82) | 0.82b |
|
Ask for specific medical advice or informationa | 39 (78) | 0.71 | 38 (76) | 0.70 | 37 (74) | 0.7 b |
|
Request or give psychological support | 34 (68) | 0.68 | 33 (66) | 0.65 | 38 (76) | 0.68b |
|
|||||||
|
Share experiencesa | 39 (78) | 0.77 | 41 (82) | 0.82b | 43 (86) | 0.74 |
|
Share newsa | 41 (82) | 0.64 | 40 (80) | 0.73 | 47 (94) | 0.81 |
|
|||||||
|
Share experiences | 44 (88) | 0.48 | 35 (70) | 0.72b | 45 (90) | 0.60 |
|
Share news | 26 (52) | 0.48 | 28 (56) | 0.52 | 33 (66) | 0.59b |
|
Advertisement | 38 (76) | 0.59 | 24 (48) | 0.53 | 42 (84) | 0.6 b |
|
Personal opinion | 39 (78) | 0.48 | 37 (74) | 0.71b | 42 (84) | 0.60 |
|
Educational materiala | 40 (80) | 0.66 | 34 (68) | 0.76 | 41 (82) | 0.79b |
aThe category of each source-category combination with at least one classifier that achieved a balanced accuracy of at least 0.75.
bThe highest balanced accuracy for each source-category combination.
Results of classifiers trained on a corresponding category from another source (N=500).
Training source | Test source | Category | Classifier | Accuracy, n (%) | Balanced accuracy |
WebMD | DailyStrength | Psychological support | SVMa | 328 (65.6) | 0.656 |
WebMD | Google+Health | Share experiences | Random forest | 428 (85.6) | 0.584 |
DailyStrength |
|
|
CNNc | 383 (76.6) |
|
|
|
SVM | 408 (81.6) |
|
|
Google+Health | Share news | CNN | 360 (72.0) | 0.562 |
aSVM: support vector machine.
bThe test source, category, and balanced accuracy of each classifier that achieved a balanced accuracy of at least 0.75 are italicized for emphasis.
cCNN: convolutional neural network.
We chose four demographic attributes as shown in
For each combination of demographic and category (eg, male and share experiences) analyzed in WebMD and DailyStrength, we found the most distinctive message boards for that combination. For WebMD, we considered only boards that have at least 0.01% of posts for a given combination, or 30 if 0.01% is less than 30. Owing to the large number of message boards on DailyStrength (1608 analyzed in this study), we reduced this restriction to only consider boards with at least 30 posts for a given combination. We then determined distinctiveness by calculating the relative difference of each board. On the basis of the calculation for top distinctive terms by Sadah et al [
RelDifcd(b)=[Freqcd(b)−AvgFreqca(b)]/AvgFreqca(b) (1),
where
In this section, we presented the categories’ results by each demographic where possible. For age demographics, we organized users into five groups: 0 to 17 years, 18 to 34 years, 35 to 44 years, 46 to 64 years, and older than 65 years. For ethnicity, we considered four possibilities: Asian, black, Hispanic, and white. For location, we considered the four regions designated by the US Census Bureau: Midwest, Northeast, South, and West. As explained in the Methods section, we considered the following categories for each source: (1) WebMD: share experiences, ask for advice, psychological support, and about family; (2) DailyStrength: share experiences and ask for advice; (3) TwitterHealth: share experiences and share news; and (4) Google+Health: share experiences and educational material.
As shown in
WebMD category frequency by gender.
Category | Gender, n (%) | |
|
Male (n=6769) | Female (n=14,117) |
Share experiences | 3290 (48.60) | 4835 (34.25) |
Ask for advice | 4741 (70.04) | 6372 (45.14) |
Psychological support | 1914 (28.28) | 5515 (39.07) |
About family | 1986 (29.34) | 3623 (25.66) |
Top 10 most distinctive WebMD message boards for male and female users in each category.
Gender | Share experiences | Ask for advice | Psychological support | About family |
Male |
Men’s Health Erectile Dysfunction Relationships and Coping Cholesterol Management Epilepsy Depression Allergies Oral Health Knee & Hip Replacement Ear, Nose & Throat |
Erectile Dysfunction Cholesterol Management Men’s Health HIV/AIDS Depression Epilepsy Prostate Cancer Sports Medicine Pain Management Ear, Nose & Throat |
Relationships and Coping Epilepsy Depression Back Pain Heart Disease Pain Management Anxiety & Panic Clomid Diabetes Parenting: 4 & 5-Year-Olds |
Relationships and Coping Depression Erectile Dysfunction Back Pain Clomid Epilepsy Anxiety & Panic Pain Management Sleep Disorders Digestive Disorders |
Female |
Sexual Abuse Survivors Support Trying to Conceive: 12 Months, Still Trying Endometriosis Breast Cancer Infertility Treatment Pregnancy: After Infertility Pregnancy: After 35 Parenting: Elementary Ages Self-Harm Menopause |
Trying to Conceive: 12 Months, Still Trying Infertility Treatment Dieting Club: 25-50 Lbs Parenting: Preteens & Teenagers Skin & Beauty Breast Cancer Food & Cooking Lupus Parenting: 3-Year-Olds Parenting: 9-12 Months |
Chronic Fatigue Syndrome Lupus Sexual Abuse Survivors Support Breast Cancer Endometriosis Dieting Club: 10-25 Lbs Trying to Conceive: 12 Months, Still Trying Pregnancy: After 35 Dieting Club: 100+ Lbs Pregnancy: After Infertility |
Sexual Abuse Survivors Support Pregnancy: After 35 Trying to Conceive: 12 Months, Still Trying Trying to Conceive: After Loss Breast Cancer Self-Harm Parenting: Preteens & Teenagers Parenting: 9-12 Months Dieting Club: 50-100 Lbs Parenting: 6-9 Months |
For our DailyStrength demographic attributes, gender, age, and location, we reported the results for the categories share experiences and ask for advice.
We also observed a general tendency for younger users (aged younger than 45 years) to share experiences on message boards about personal and social issues, whereas older users favored message boards for general support and discussion. Users in all age groups frequently asked for advice about physical conditions. We found no clear trend in sharing experiences when evaluating census regions, but we saw that users from the Northeast region share experiences about physical and psychological conditions, whereas users from the West region often shared experiences on message boards for general support and discussion. Users from all regions frequently asked for advice about physical conditions except the West, whose users tended to ask for advice on message boards for general support and discussion. Note that there are fewer than 10 message boards listed for users of age 0 to 17 years who asked for advice in
DailyStrength category frequency by gender, age, and location.
Attribute and demographic | Total number of participants | Share experiences, n (%) | Ask for advice, n (%) | |
|
||||
|
Male | 95,269 | 78,760 (82.67) | 31,706 (33.28) |
|
Female | 456,600 | 409,640 (89.72) | 167,867 (36.76) |
|
||||
|
0-17 | 6656 | 6175 (92.77) | 1694 (25.45) |
|
18-34 | 187,966 | 173,226 (92.16) | 65,191 (34.68) |
|
35-44 | 126,646 | 113,796 (89.85) | 48,335 (38.17) |
|
45-64 | 149,487 | 127,089 (85.02) | 54,008 (36.13) |
|
≥65 | 29,847 | 24,420 (81.82) | 10,581 (35.45) |
|
||||
|
Northeast | 73,221 | 65,761 (89.81) | 28,196 (38.51) |
|
Midwest | 123,556 | 76,630 (90.90) | 31,600 (37.48) |
|
South | 123,556 | 110,597 (89.51) | 46,933 (37.99) |
|
West | 92,809 | 76,797 (82.75) | 31,481 (33.92) |
Top 10 most distinctive DailyStrength message boards for male and female users in each category.
Gender | Share experiences | Ask for advice |
Male |
Vow To Live LGBT Against Suicide Christian Church 24.7 Ministry Gay Men’s Challenges Single Dads GOYA Dealing with Diabetes2 and remembering Goldi A Child Abuse Survivors Group CALM and EASY GAMES Financial Challenges Liars Anonymous |
A Laughter Club Dealing with Diabetes2 and remembering Goldi Impotence & Erectile Dysfunction Sex/Pornography Addiction High Cholesterol Tinnitus, Deafness and Ear Problems Urinary Incontinence Atrial Fibrillation (AFib) MRSA LDN .. Low Dose Naltrexone |
Female |
helping with the housework Lesbian Relationship Challenges prompts AlAnon One Day At A Time Daughters of Abusive Mothers Breastfeeding Parenting Toddlers (1-3) Post-Partum Depression Infertility Vulvar Cancer |
Pregnancy Menopause Trying To Conceive Miscarriage Polycystic Ovarian Syndrome (PCOS) Family & Friends of Bipolar WHY WEIGHT? LET’S LOSE WEIGHT AND FEEL GREAT! Infertility Vulvar Cancer Breastfeeding |
Top 10 most distinctive DailyStrength message boards for each age group in each category.
Age group (years) | Share experiences | Ask for advice |
0-17 |
Weight Loss For Teens Gay & Lesbian Teens Depression–Teen Bipolar Disorder–Teen Self-Injury Transgender Depression Coming Out Bisexuality Eating Disorders |
Weight Loss For Teens Depression–Teen Self-Injury Eating Disorders Anxiety |
18-34 |
Sunny and Peaceful Skies Parenting Toddlers (1-3) Daily Positive Thoughts Trying To Conceive Parenting Newborns & Infants (0-1) College Stress Arnold-Chiari Malformation ALL MOODY BLUES Career Changes Cerebral Palsy |
Trying To Conceive Neuropathy Pregnancy Miscarriage Polycystic Ovarian Syndrome (PCOS) Cerebral Palsy Endometriosis Pseudotumor Cerebri Sexually Transmitted Diseases–Female Schizophrenia |
35-44 |
Vow To Live LGBT Against Suicide Parenting 'Tweens (9-12) Twins, Triplets & More Self-Hate Syndrome Parents Whose children have been sexually abused HOPEFUL HEARTS...LIVING AGAIN AFTER THE LOSS Neurofibromatosis Breastfeeding Hyperparathyroidism Stillbirth |
kindredspirits Hyperparathyroidism Multiple Sclerosis (MS) Pseudotumor Cerebri Allergies Hemochromatosis Hypothyroidism Addison’s Disease MCTD Graves’ Disease |
45-64 |
acoa sanctuary prompts Christians with MS InHisCare Bible Study The Serenity Room Ticked off about Lyme Biblical Studies and Archaeology Alanon support group Just support WHY WEIGHT? LET’S LOSE WEIGHT AND FEEL GREAT! |
WHY WEIGHT? LETS LOSE WEIGHT AND FEEL GREAT! MS People Dealing with MS Pain Dealing with Diabetes2 and remembering Goldi Multiple Myeloma Menopause High Cholesterol LDN .. Low Dose Naltrexone Myofascial Pain Syndrome Neurocardiogenic Syncope Amputees |
≥65 |
Banana A Little Bit Of Kindness Goes A long Way! AlAnon One Day At A Time VOICES OF RECOVERY The Walking Group The Front Porch Over The Fence Muscular Dystrophies CALM and EASY GAMES movie lovers |
AlAnon One Day At A Time VOICES OF RECOVERY I can’t HEAR you! COPD & Emphysema Meniere’s Disease Parkinson’s Disease Sleep Apnea Interstitial Cystitis (IC) Atrial Fibrillation (AFib) Acromegaly |
Top 10 most distinctive DailyStrength message boards for each region in each category.
Region | Share experiences | Ask for advice |
Northeast |
WHY WEIGHT? LET’S LOSE WEIGHT AND FEEL GREAT! Self-Hate Syndrome Smoking Addiction & Recovery Urinary Incontinence Families of Prisoners Agoraphobia & Social Anxiety Cocaine Addiction & Recovery Obesity CHRISTIAN PARENTS of ESTRANGED ADULT CHILDREN Brain Injury |
WHY WEIGHT? LET’S LOSE WEIGHT AND FEEL GREAT! Obesity Hidradenitis Suppurativa Endometriosis Deep Vein Thrombosis (DVT) Atrial Fibrillation (AFib) Diets & Weight Maintenance Gastritis Polycystic Kidney Disease (PKD) Hypothyroidism |
Midwest |
Just support acoa sanctuary helping with the housework kindredspirits The Coffee Shop aa Spoken Here Highly Sensitive People HSP Financial Challenges I can’t HEAR you! Pseudotumor Cerebri |
kindredspirits Neurocardiogenic Syncope Pseudotumor Cerebri Gastritis Irritable Bowel Syndrome (IBS) COPD & Emphysema Parkinson’s Disease Polycystic Kidney Disease (PKD) Pancreatitis Graves’ Disease |
South |
prompts Beyond Medication InHisCare Bible Study Ticked off about Lyme Muscular Dystrophies aa friends Anxiety and POSITIVE CHOICES Games for Fun and Relaxation MS People Dealing with MS Pain Parents Whose children have been sexually abused |
MS People Dealing with MS Pain High Cholesterol Cirrhosis Polymyositis & Dermatomyositis Addison’s Disease Meniere’s Disease MCTD Trying To Conceive Endometriosis Polycystic Ovarian Syndrome (PCOS) |
West |
A Little Bit Of Kindness Goes A long Way! The Walking Group Alanon support group VOICES OF RECOVERY AlAnon One Day At A Time BIBLICAL STUDIES The Sunflower group My Favorite Things. FrIeNdShIpRoOm three prayerpraise |
AlAnon One Day At A Time Banana The Sunflower group WINGS VOICES OF RECOVERY A Laughter Club FrIeNdShIpRoOm Myofascial Pain Syndrome Hemochromatosis Colon Cancer |
For our Twitter demographic attributes, gender, ethnicity, and location, with gender and ethnicity predicted by the classifier from Mislove et al [
Twitter category frequency by gender, ethnicity, and location.
Attribute and demographic | Total number of participants | Share experiences, n (%) | Share news, n (%) | |
|
||||
|
Male | 16,092 | 3188 (19.81) | 1277 (7.94) |
|
Female | 17,850 | 4835 (27.09) | 1091 (6.11) |
|
||||
|
Asian | 626 | 166 (26.52) | 34 (5.43) |
|
Black | 56 | 12 (21) | 3 (5) |
|
Hispanic | 2833 | 826 (29.16) | 155 (5.47) |
|
White | 9992 | 2259 (22.61) | 728 (7.29) |
|
||||
|
Northeast | 5362 | 1093 (20.38) | 545 (10.16) |
|
Midwest | 4686 | 1084 (23.13) | 380 (8.11) |
|
South | 9855 | 2162 (21.94) | 850 (8.63) |
|
West | 5448 | 1164 (21.37) | 515 (9.45) |
We also performed this analysis on our full Twitter dataset of 11,637,888 tweets. We compared these results with the results shown in
Category | Male | Female | Asian | Black | Hispanic | White | Northeast | Midwest | South | West |
Share Experiences | <.001 | .47 | .24 | .80 | .68 | .15 | .13 | .048 | .002 | <.001 |
Share News | <.001 | <.001 | <.001 | .23 | <.001 | <.001 | <.001 | <.001 | <.001 | <.001 |
Our Google+ demographic attributes include gender, age, ethnicity, and location, with ethnicity predicted by the classifier from Mislove et al [
From these results, we saw that most demographics appeared to share experiences more frequently than the set of all Google+ users. This is likely the effect of a bias toward users who chose to report these attributes (or a real name, in the case of ethnicity). When comparing how often a demographic shares experiences with how often posts from users with no data on that demographic’s corresponding attribute share experiences (eg, posts from men vs posts from users who did not report gender), we found that
Educational material was shared less frequently by users aged between 35 and 44 years, 14.9% (46/308) than by users of any other age group. In particular, they shared educational material much less frequently than both the previous age group, 18 to 34 years, 25.5% (141/552),
Google+ category frequency by gender, age, ethnicity, and location.
Attribute and demographic | Total number of participants | Share experiences, n (%) | Educational material, n (%) | |
|
||||
|
Male | 61,479 | 15,234 (24.78) | 16,200 (26.35) |
|
Female | 32,082 | 9803 (30.56) | 8029 (25.03) |
|
||||
|
0-17 | 42 | 19 (45.24) | 8 (19.05) |
|
18-34 | 552 | 189 (34.24) | 141 (25.54) |
|
35-44 | 308 | 101 (32.79) | 46 (14.94) |
|
45-64 | 499 | 62 (12.42) | 171 (34.27) |
|
≥65 | 45 | 9 (20.00) | 13 (28.89) |
|
||||
|
Asian | 2825 | 730 (25.84) | 1010 (35.75) |
|
Black | 72 | 28 (38.89) | 13 (18.06) |
|
Hispanic | 3389 | 1137 (33.55) | 707 (20.86) |
|
White | 17,230 | 5076 (29.46) | 3340 (19.38) |
|
||||
|
Northeast | 4510 | 1097 (24.32) | 957 (21.22) |
|
Midwest | 4210 | 1310 (31.12) | 716 (17.01) |
|
South | 9532 | 2636 (27.65) | 1913 (20.07) |
|
West | 7959 | 2279 (28.63) | 1708 (21.46) |
Our analysis shows several interesting results. From our initial samples, we found that health-related posts from general social networks often shared news and educational material, and posts on health-related online forums frequently shared experiences, asked for medical advice, and requested or gave psychological support (
A further analysis of our health-related online forum data showed distinct differences between users of WebMD and DailyStrength. On WebMD, we found that the majority of posts made by male users and almost half of all posts made by female users asked for advice. This would seem to contradict an earlier study that found that women were the predominant users of the internet for health advice [
An analysis of health-related posts on general social networks, Twitter and Google+, suggested differences that they have from health-related online forums. Compared with WebMD and DailyStrength, sharing experiences, which identifies posts in which a user shared a personal experience related to a health-related topic, is far less frequent in posts from Twitter and Google+ that contain one or more of the health-related keywords used in this study. The relatively low frequency of sharing experiences in our sample of several health-related topics on general social networks compared with the frequency of sharing experiences on health-related online forums may be due to a variety of factors, such as Twitter’s lack of health-related communities because of its structure as well as WebMD’s and DailyStrength’s focus on answering medical questions and providing support, respectively. Some subsets of health-related tweets studied in other work have low proportions of sharing experiences similar to our observations, such as tweets about depression [
Our comparison of results between our stratified sample of Twitter data with tweets from suspected bots removed and our full Twitter dataset showed that automated accounts had a significant impact on the share news category. Other work has also shown that bots can have an effect on health-related Twitter conversations, particularly on the subject of vaccination. Bots post both pro- and antivaccine tweets [
The differences in how often educational material is shared on Google+ between the demographics we studied highlight potential targets for informational health care campaigns. A health care campaign is a health care–related broad nationally or subnationally driven, led, or coordinated activity [
Our results provide useful information that can help health care providers to reach the right demographic group. For example, researchers looking for clinical trial participants can use health-related online forums, where many posts are about sharing experiences. Moreover, demographic-specific results can help guide the targeted educational campaigns. As an example, male WebMD users ask specific medical advice questions more often than females, so male WebMD users may be more receptive to a campaign offering advice from medical experts.
The classifier models used in this study can also be useful for researchers who want to study posts that contain the categories we studied. For example, a researcher who wants to study experiences about a particular drug can use these classifiers to find posts that share experiences from a larger dataset of posts that mention that drug. As another example, a researcher who wants to find out which disorders are frequently mentioned among users who share news can use a classifier to gather a dataset of news-sharing posts. In general, we provided researchers with tools that enable them to answer hypotheses and do research on the subject of health-related social media posts. These tools are provided by the description of our methodology, which describes how one might build these classifier models, and by trained classifier models that are available on request. Similar tools may also be applicable to the categories in the scheme proposed by Lopes and Da Silva [
As users of health-related social media use an informal writing style, our selected 274 words to filter Twitter and Google+ as described in the Methods section may not cover all health-related posts or their variability in topics. For example, the abbreviation
We found that some Twitter categories have a high proportion of tweets from automated accounts. Although we have attempted to filter out tweets from such accounts, some such tweets may still exist in the data used in our analysis, and tweets from legitimate accounts may have been filtered out. Our initial evaluation of bot prevalence also found that the educational material category had a high proportion of tweets from bots. This may be also true of that category in the Google+ data, which was not filtered for bots; thus, those results may not accurately represent the demographics studied.
Our demographic populations may not be fully representative of all users from the sources in our study. As shown in
In this study, we analyzed the content shared in two different types of health-related social media: health-related online forums and general social networks. For the two types of health-related social media, we manually identified 4 post categories: share experiences, ask for specific medical advice, request or give psychological support, and about family; and we additionally identified 5 categories for general social networks: share news, jokes, advertisements, personal opinion, and educational material. After labeling randomly selected data for each source, we built classifiers for each category. Finally, we made demographic-based content analyses where possible.
application programming interface
convolutional neural network
support vector machine
This project was partially supported by the National Science Foundation grant numbers IIS-1619463, IIS-1746031, IIS-1838222, and IIS-1901379. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation.
RR conducted the experiments and analysis and wrote the manuscript. SS conducted earlier versions of the experiments and analysis and assisted in the writing of the manuscript. YG coordinated the labeling of the training datasets and conducted preliminary research. VH conceived the study and provided coordination and guidance in the experiments and writing of the manuscript.
None declared.