This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.
Social media offer an unprecedented opportunity to explore how people talk about health care at a very large scale. Numerous studies have shown the importance of websites with user forums for people seeking information related to health. Parents turn to some of these sites, colloquially referred to as “mommy blogs,” to share concerns about children’s health care, including vaccination. Although substantial work has considered the role of social media, particularly Twitter, in discussions of vaccination and other health care–related issues, there has been little work on describing the underlying structure of these discussions and the role of persuasive storytelling, particularly on sites with no limits on post length. Understanding the role of persuasive storytelling at Internet scale provides useful insight into how people discuss vaccinations, including exemption-seeking behavior, which has been tied to a recent diminution of herd immunity in some communities.
To develop an automated and scalable machine-learning method for story aggregation on social media sites dedicated to discussions of parenting. We wanted to discover the aggregate narrative frameworks to which individuals, through their exchange of experiences and commentary, contribute over time in a particular topic domain. We also wanted to characterize temporal trends in these narrative frameworks on the sites over the study period.
To ensure that our data capture long-term discussions and not short-term reactions to recent events, we developed a dataset of 1.99 million posts contributed by 40,056 users and viewed 20.12 million times indexed from 2 parenting sites over a period of 105 months. Using probabilistic methods, we determined the topics of discussion on these parenting sites. We developed a generative statistical-mechanical narrative model to automatically extract the underlying stories and story fragments from millions of posts. We aggregated the stories into an overarching narrative framework graph. In our model, stories were represented as network graphs with actants as nodes and their various relationships as edges. We estimated the latent stories circulating on these sites by modeling the posts as a sampling of the hidden narrative framework graph. Temporal trends were examined based on monthly user-poststatistics.
We discovered that discussions of exemption from vaccination requirements are highly represented. We found a strong narrative framework related to exemption seeking and a culture of distrust of government and medical institutions. Various posts reinforced part of the narrative framework graph in which parents, medical professionals, and religious institutions emerged as key nodes, and exemption seeking emerged as an important edge. In the aggregate story, parents used religion or belief to acquire exemptions to protect their children from vaccines that are required by schools or government institutions, but (allegedly) cause adverse reactions such as autism, pain, compromised immunity, and even death. Although parents joined and left the discussion forums over time, discussions and stories about exemptions were persistent and robust to these membership changes.
Analyzing parent forums about health care using an automated analytic approach, such as the one presented here, allows the detection of widespread narrative frameworks that structure and inform discussions. In most vaccination stories from the sites we analyzed, it is taken for granted that vaccines and not vaccine preventable diseases (VPDs) pose a threat to children. Because vaccines are seen as a threat, parents focus on sharing successful strategies for avoiding them, with exemption being the foremost among these strategies. When new parents join such sites, they may be exposed to this endemic narrative framework in the threads they read and to which they contribute, which may influence their health care decision making.
Over the past decade and a half, the explosion in social media and the concomitant rise in informational websites has changed the manner in which people access health care information [
Among the many topics discussed on these parenting sites, few topics garner as much attention and vigorous discussion as childhood vaccination. Despite the fact that safe and effective vaccines exist, sporadic outbreaks of vaccine preventable diseases (VPDs) point to the continuing tension between public programs intended to make these vaccinations easily accessible and broadly adapted and parents who resist vaccination based largely on ideological principles [
Although simple inspection of parenting sites and standard text mining approaches can confirm that vaccination is a topic of frequent discussion on these sites, such methods cannot determine the structure of those discussions. This is the objective of our research.
In this research, we analyzed 1.99 million posts contributed by 40,056 users and viewed 20.12 million times indexed from 2 popular parenting sites over a period of 105 months ending in 2012. Beyond simply identifying the main topics of discussion on the sites, we discovered the underlying narrative frameworks that explain the stories circulating in these various discussions, an approach that extends recent work on personal experience and health knowledge exchange in Internet forums [
Data for this study were obtained from 2 popular social media sites dedicated to parenting. We chose these 2 sites because of their popularity among new parents, with a membership comprised primarily of people who self-identify as mothers [
Workflow for aggregate narrative framework discovery.
For the 2 parenting sites, we determined the topics of discussion and the stories circulating in those discussions through an automated content analysis process. We started by computing dominant topics in the forums using 2 different probabilistic approaches, Latent Dirichlet Allocation (LDA) and Contextual Random Walk Traps (CRWT) [
To understand how people talked about the discovered topics, we developed a story model, the actant-relationship context model, and used it to extract the underlying stories from posts across the entire set of 1.99 million discussion posts, recognizing that forum posts frequently include only parts of stories or comments on story parts as opposed to complete stories. We conceptualize story parts as relationships among actants [
In our model, to generate a social media post, a user picks a set of actants and draws from the distribution of relationships among those actants. The user then composes the post according to the outcomes in the first step. In a social media corpus, the underlying probabilistic model including both the primary actants and their contextual relationships is hidden. Consequently, our task was to estimate this hidden model from the posts. We accomplished this through a computationally scalable estimation algorithm that requires minimal supervision. Because the data were large scale and the story signals were persistent, we found that a computationally scalable inference algorithm using minimal information (such as nouns and verbs) from Natural Language Processing (NLP) tools gave us accurate results for our dataset.
We used the automatically discovered topics to determine the important actants in the topic space, recognizing that topics could cut across the siloes of forum classifications. To do so, we extracted a pool of actant terms based on a ranked list of high-frequency nouns. These nouns were, in turn, aggregated to derive actant categories. In topics associated with vaccination, we discovered 3 main categories of actants: individual actants, comprising parents, children, and medical professionals; institutional actants, comprising government institutions, religious institutions, pharmaceutical companies, and schools; and objects, comprising vaccines, exemptions, VPDs, and adverse effects. The words associated with an actant consist of both synonyms for the actant and entities that have the actant as a super-category. For example, the actant “government” includes the colloquial synonym “the Feds” as well as the government institution, the “CDC,” where “government” is the super-category for CDC.
We characterized the context between a pair of actants by a set of verbs that are significant when the 2 actants are discussed simultaneously. Verbs are known to capture binary relationships in large-scale corpora [
In order to establish the significance of a verb for a particular pair of actants (ie, a context), we compared the conditional probability of the verb appearing with both actants to its marginal probability: A verb is
As there are many verbs involved in any context, we ranked the relative significance of the different verbs via a scoring or weighting function
To implement the above idea computationally, we tagged the entire corpus with parts of speech (POS) tags, using the Natural Language Toolkit (NLTK) library in Python [
Calculation of the Kullback-Leibler divergence metric as a weighting function to rank the significance of different verbs.
Calculation of the marginal probability of the verb appearing in any sentence in the corpus.
Calculation of the conditional probability of a verb appearing with both actants in a particular context.
Calculation of ranking to determine the set of top verbs characterizing a given context.
Once we had determined the ranked verb list for a context, we returned to the sentences for that context and determined the actant pairs that these significant verbs related using the POS tagger output. Recognizing that different verbs may capture the same type of relationship between actants, we grouped verbs into “relationship” categories, just as we grouped nouns into actant categories. Taking a cue from the narrative theory, we classified these relationships according to a series of binary oppositions between verbs, with highly ranked synonyms grouped together with their highly ranked antonyms, allowing us to align those relationships to the structure of the personal experience narrative as well [
We identified 2 main categories of binary opposite relationships. The first set of these relationships were those between individuals and institutional actants, with the binary oppositions
To illustrate this process, consider the verb “use” which was determined to be a significant verb in the Exemption
Here is some New York info: (sample exemption letters here) Here is info about how you do not have to prove membership in a church in order to use a religious exemption
The verb “use” relates “you” (which is a Parent actant) with “church” (which is a Religious Institutions actant). The category of the verbs that connects the 2 actants becomes the significant relationship between those actants in that context. For example, in the above case, as the verb “use” falls into the
We visualized each context as a network story graph, with the actants as the nodes, and the significant relationships as the edges connecting the actants, thereby capturing the rich structures of relationships among actants for any context. We then create a summary graph by aggregating the story graphs for each context into a single graph. We label this summary graph a narrative framework.
To characterize temporal trends in new posting activity that concerned vaccination exemptions and new user activity concerning exemptions, for each site, we calculated (1) the monthly proportion of new posts that included the word “exemption” and (2) the proportion of new users each month who committed a post with the word “exemption” in it. As users have access to old as well as new posts, in order to characterize the fraction of post content pertaining to exemptions that would be visible to users of the forums, we also calculated over the study period the monthly cumulative proportion of posts that included the word “exemption.” We produced a log-linear plot of the distribution of user-activity duration (in days) for the mothering.com site using a bin width of 3 months.
On our 2 target sites, topic modeling revealed that vaccinations and, interestingly, exemptions constitute significant topics of discussion (for a full listing of topics, see the
We ran LDA topic modeling in R at multiple levels of granularity, from k=20 to k=200 in intervals of 20 (samples of the LDA topic models are included in the
The CRWT method similarly yielded a more varied set of topics for the second site than for mothering.com, but the exemption-related topic was still distinct on both sites, constituted by some of the following words: “religious exemption beliefs exemptions belief belong supreme required.” As part of its output, CRWT yields a hierarchy of topics. For example, the “exemption” topic on mothering.com reveals a hierarchy where exemption is a super-category of “refusal,” “belief,” and “requirements,” as illustrated in
A hierarchical structure of topics related to exemption computed by the Contextual Random Walk Traps (CRWT) method from mothering.com posts. PR represents the page rank of the word-nodes in the co-occurrence network.
The story model allows us to determine how people talk about the topics discovered through topic modeling. Recognizing that these topics can be discussed across the entire corpus, we do not assign documents to topics. Rather, we focus on discovering the underlying narrative framework, the activation of which in various posts, contributes to the structure of those discussions.
First, we determine the actants in the topic space (
We illustrate these findings with story graphs for different contexts, aggregating these into a single narrative framework graph (
Actant model.
Entities (nodes) | Associated word set | |
Parents | parents, parent, i, we, us, you | |
Children | child, kid, kids, children, daughter, daughters, son, sons, toddler, toddlers, kiddo, boy, d(ear)d(aughter), d(ear)s(on) | |
Medical professionals | doctor, doctors, pediatrician, pediatricians, nurse, nurses, ped, md, dr | |
Government | government, cdc, federal, feds, center for disease control, officials, politician, official, law | |
Religious institutions | faith, religion, pastor, pastors, parish, parishes, church, churches, congregation, congregations, clergy | |
Schools | teacher, teachers, preschools, preschool, school, schools, class, daycare, daycares, classes | |
Pharmaceutical companies | pharma, big pharma, company, companies | |
Vaccines | vaccines, vax, vaccine, vaccination, vaccinations, shots, shot, vaxed, unvax, unvaxed, nonvaxed, vaccinate, vaccinated, vaxes, vaxing, vaccinating, substances, ingredients | |
Exemptions | exemption, exempt | |
VPDsa | varicella, chickenpox, flu, whooping cough, tetanus, pertussis, hepatitis, polio, mumps, measles, diphtheria | |
Adverse effects | autism, autistic, fever, fevers, reaction, reactions, infection, infections, inflammation, inflammations, pain, pains, bleeding, bruising, diarrhea, diarrhea |
aVPDs: vaccine preventable diseases.
Relationship model.
Relationships (edges) | Associated word set (stemmed) | |
Require or resist | force, require, need, follow, mandate | |
Advise or question | recommend, tell, said, object, ask, learn, teach | |
Protect or threaten | protect, injure, damage | |
Employ or ignore | use, submit, ignore | |
Accept or reject | vaccinate, unvaccinate, vax, unvax, receive, have, had, get, inject, exclude, allow, exempt, believe, receive, request, deny, accept | |
Attend or avoid | enter, enroll, attend, go, send, homeschool | |
Seek or aver | seek, file, sign, claim, submit, need, exercise, lie, claim | |
Grant or withold | accept, approve, get, abuse, grant, oppose, deny | |
Protect or threaten | protect, injure, damage | |
Cause or not cause | expose, get, contract, cause, develop, suffer, die, vomit, diagnose |
Top 10 high-relevancy verbs (stemmed) that characterize the contexts comprising “Exemption” and each of the other major actant categories on the second site. The verbs are ordered according to the KL Divergence scores, but we have shown the frequency of the verbs in parenthesis for comparison. In the Exemption–Parent context, the verb “have” with a frequency count of 1561 is ranked fourth way before “exercise,” which has a frequency of only 275.
Actants in children context | Significant verbs in relationship to exemption (actant) |
Parents | exempt(207), exercis(228), sign(275), have(1561), concern(196), claim(185), vaccin(241), belong(132), us(472), requir(199) |
Children | exempt(220), exercis(191), concern(175), vaccin(202), vax(133), requir(152), sign(116), attend(99), enrol(56), allow(106) |
Medical professionals | sign(101), exempt(21), give(72), write(34), requir(35), get(131), have(229), submit(16), obtain(17), file(17) |
Government | bind(51), determin(51), requir(41), us(67), exempt(17), belong(22), accept(29), seek(19), furnish(8), obtain(12) |
Religious institutions | belong(294), rule(114), offer(152), do(456), claim(105), us(191), have(445), find(149), bind(51), determin(50) |
Schools | belong(146), rule(116), offer(160), requir(130), sign(120), attend(103), exempt(59), find(184), have(682), accept(104) |
Vaccines | vaccin(377), exempt(173), requir(323), vax(252), claim(208), receiv(191), sign(150), request(106), allow(200), oppos(91) |
VPDsa | requir(14), exempt(7), vaccin(15), sign(12), get(34), refus(9), have(55), prove(6), document(4), decid(8) |
Adverse effects | had(64), exempt(12), obtain(12), requir(18), get(60), link(12), choose(9), follow(15), increas(11), qualifi(9) |
aVPDs: vaccine preventable diseases.
Top 10 high-relevancy verbs (stemmed) that characterize the contexts comprising “Children” and each of the other major actant categories on the second site. The verbs are ordered according to the KL Divergence scores, but we show the frequency of the verbs in parenthesis for comparison.
Actants in exemption context | Significant verbs in relationship to children (actant) |
Parents | have(190359), give(27803), learn(18254), am(46173), choos(11446), want(46512), think(60272), rais(9756), know(61261), teach(8982) |
Medical professionals | nurs(1728), vaccin(1450), told(1986), had(4091), said(2582), diagnos(893), take(2467), recommend(642), give(1782), took(926) |
Government | vaccin(615), recommend(447), receiv(380), accord(291), mandat(144), ha(1211), caus(328), injur(139), report(214), includ(299) |
Religious institutions | teach(2010), are(2873), is(3706), church(127), attend(221), go(937), rais(174), believ(335), allow(164), pray(73) |
Schools | attend(2150), go(11284), ha(10885), send(2041), start(4659), work(4910), get(12171), daycar(621), need(5944), teach(1645), enrol(689) |
Vaccines | vaccin(16262), vax(6176), receiv(3859), ha(8439), injur(1247), given(2321), unvaccin(930), caus(2315), recommend(1507), unvax(657), protect(1269) |
VPDsa | vaccin(1409), receiv(1071), had(2622), get(2840), recommend(510), vax(450), given(590), develop(441), got(1005), expos(333) |
Adverse effects | ha(17530), diagnos(4961), have(29879), had(9852), autism(1023), caus(2617), develop(1461), vaccin(1756), is(33778), affect(1064) |
aVPDs: vaccine preventable diseases.
The summarized story graph (
The summarized story graph also reveals several notable substories. In one, religious institutions rather than schools play the role of “teacher.” In this substory, schools are relegated to the role of parental adversary, requiring vaccinations and wielding the power to accept or reject exemptions. In another substory, medical professionals play the role of the adversary. Parents question them over the necessity of vaccines, and resent them as the enforcers of vaccine requirements (threat). Yet, parents also need the medical professionals’ help, as they act as the grantors of exemptions (strategy). Two glaring omissions in this and all the substories on which the summary narrative framework graph is based are the near total absence of VPDs and pharmaceutical companies as actants. The only role that the VPDs play is a passive one: children contract them (see the penultimate row in
Story graphs and narratives: subfigures a-e illustrate the story graphs corresponding to different contexts in mothering.com, while subfigure f is an aggregate master narrative graph.
A log-linear plot of the distribution of user-activity duration (in days) for the mothering.com site is presented in
Monthly proportion of new posts that included the work “exemption” for the two sites.
Monthly proportion of new users who committed a post that included the word “exemption” for the two sites.
Cumulative proportion of posts that contained the word “exemption” for each site.
Log-linear plot of the distribution of user-activity duration (in days) for the mothering.com site.
The methods we have developed for this study allow us to discover stories circulating informally on social media sites. Our system can detect the presence, persistence, and pervasiveness of story signals on otherwise very noisy sites, aggregate these story signals into a narrative framework, and provide a clear mechanism for tracing the emergence of specific strategies endorsed in these stories that parents might adopt to counteract perceived health-related threats. The sites play an important role in exposing parents to the ideas of vaccinations as threat, and the use of exemptions as a strategy to combat that threat [
On the sites we studied, the narrative framework is one where vaccines pose a threat to children, and parents in their role as protectors devise strategies, most often the use of exemptions, to thwart that threat. The narrative framework is so widely dispersed that it has traversed many segments of the parenting sites. A strong, persistent signal in these discussions reveals that parents actively pursued information about exemptions on an ongoing basis. New parents who joined these sites were likely to be quickly exposed to the beliefs encoded in these stories and the underlying narrative framework.
Given the well-established 90-9-1 rule of social media, where 90% of visitors simply read without commenting (9%) or contributing (1%) [
Our work has certain limitations. As with all social media research, it is not clear that the sites on which we focused are representative of parents and health care decision makers as a whole. Although the 2 sites we used are very popular among parents, we recognize that they do not capture the broad range of discussions that take place in informal settings not easily observed, such as the playground, school gatherings, and other places where parents interact. In those settings, ethnographic fieldwork could provide important qualitative perspectives on vaccine-related discussions and storytelling [
We recognize that parenting sites have certain biases. Mothering.com, for example, given its long-standing relationship with the now defunct
Our work does not currently include sentiment detection. Although we advance the inclusion of social media in health care beyond topic discovery to an analysis of the underlying narrative structure of those discussions, we have not studied the manner in which those discussions are framed, which sentiment detection may be able to provide.
Data privacy continues to be a significant concern for social media research. In our study, we anonymized all of our data as part of the indexing process, and thus were unable to exploit certain features of individual user and user community data. The trade-offs between user privacy and research benefits are part of the constantly shifting terrain of social media research, and we chose to err on the side of privacy. Data access is becoming an equally significant problem, as social media corporations are greatly reducing access to data that people post and share on their sites. These limitations make it increasingly difficult to track large-scale conversations over long periods of time.
Vaccination decision-making has been broadly studied [
Narrative is recognized as a key means for shaping belief. Radzikowski et al [
Random sampling methods are another approach for understanding developing attitudes toward health care. However, the narratives we discover would be difficult to identify using random sampling approaches. Whereas the use of focus groups has shown itself to be particularly helpful in devising messaging campaigns for specific communities [
Injecting an idea, such as the efficacy of exemption as a strategy to avoid vaccination, into online communities has the potential to influence many people—the idea can, in a phrase, “go viral.” Given the persuasive nature of personal experience narrative, storytelling plays a central role in exposing people to ideas and converting people to particular beliefs. Importantly, people are inclined to believe first-hand accounts from members of their community, as opposed to official pronouncements [
The sheer volume of discussions on social media sites dedicated to parenting along with the knowledge that many people use Internet resources as their first line of health care information mean that these forums deserve ongoing attention [
Supplemental materials.
This work was supported in part by the NIH Grant Number R01 GM105033-01.
None declared.