This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.
In this age of social media, any news—good or bad—has the potential to spread in unpredictable ways. Changes in public sentiment have the potential to either drive or limit investment in publicly funded activities, such as scientific research. As a result, understanding the ways in which reported cases of scientific misconduct shape public sentiment is becoming increasingly essential—for researchers and institutions, as well as for policy makers and funders. In this study, we thus set out to assess and define the patterns according to which public sentiment may change in response to reported cases of scientific misconduct. This study focuses on the public response to the events involved in a recent case of major scientific misconduct that occurred in 2014 in Japan—stimulus-triggered acquisition of pluripotency (STAP) cell case.
The aims of this study were to determine (1) the patterns according to which public sentiment changes in response to scientific misconduct; (2) whether such measures vary significantly, coincident with major timeline events; and (3) whether the changes observed mirror the response patterns reported in the literature with respect to other classes of events, such as entertainment news and disaster reports.
The recent STAP cell scandal is used as a test case. Changes in the volume and polarity of discussion were assessed using a sampling of case-related Twitter data, published between January 28, 2014 and March 15, 2015. Rapidminer was used for text processing and the popular bag-of-words algorithm, SentiWordNet, was used in Rapidminer to calculate sentiment for each sample Tweet. Relative volume and sentiment was then assessed overall, month-to-month, and with respect to individual entities.
Despite the ostensibly negative subject, average sentiment over the observed period tended to be neutral (−0.04); however, a notable downward trend (
These results suggest that public opinion toward scientific research may be subject to the same sensationalist dynamics driving public opinion in other, consumer-oriented topics. The patterns in public response observed here, with respect to the STAP cell case, were found to be consistent with those observed in the literature with respect to other classes of news-worthy events on Twitter. Discussion was found to become strongly polarized only during times of increased public attention, and such increases tended to be driven primarily by negative reporting and reactionary commentary.
With the rise of social network services (SNS), all news events, no matter how large or small, have become subject to intense public scrutiny and debate [
Recent investigations into communication on Twitter have uncovered common, generalizable patterns in the way sentiment changes in response to the emergence of notable events—namely, that increases in public attention are coincident with increases in negative sentiment [
One area of particular interest is scientific misconduct, particularly in the areas of academic and medical science. Scientific misconduct concerns more than just a given researcher or institution; damage to public perception of, and goodwill toward scientific research itself is a driving concern [
Here, we assess and define the patterns according to which public sentiment may change in response to reports of academic scientific misconduct on Twitter. This study focuses on public response to a recent and widely covered case of scientific misconduct—the stimulus-triggered acquisition of pluripotency (STAP) cell case that occurred in Japan in 2014 [
Here, we have demonstrated that Twitter response to the STAP case tended to generally stay neutral, but specifically skew negative as discussion polarity and volume increased. Our results are consistent with those observed in studies covering other topics of interest [
For the purpose of this analysis, an event timeline was constructed based on primary reports and press-releases [
Twitter data (“Tweets”) were obtained directly from Apple’s now defunct subsidiary, Topsy [
An automated platform for downloading and processing Tweets was developed using RStudio [
Initial processing of data was conducted programmatically within RStudio. This included formatting and transformation of the downloaded data into tables. Downloaded tables were saved and processed using Excel 2013. This included the programmatic identification and removal of all Tweets containing non-English characters or text. In addition, all Tweets were manually checked to remove irrelevant or spam posts, resulting in a final dataset of n=9467 Tweets total.
For the purpose of sentiment classification, RapidMiner version 6.3, Enterprise Edition (Rapid-I GmbH, Dortmund, Germany) was used. The following preprocessing steps were followed (
Text processing steps.
Steps | Description |
Tokenize | Parse every tweet into separate, single-element tokens (ie, words or word-parts) |
Transform cases | Makes all text lower case to facilitate data processing |
Filter tokens by length | Removes tokens consisting of less than 2 characters |
Filter English stop words | Removes common, low-information particles (eg, “the”) and punctuation marks |
Filter tokens by content | Removed hashtags and other message-irrelevant tokens such as “http” |
Stemming (WordNet) | Algorithm for identifying and groups tokens as lemmas, to facilitate processing |
Generate n-grams | Generates list of all two-, three-, or four-word token combinations (ie, phrases) |
Word vector creation | Generate metric indicative of the measuring the important of each word in a tweet |
Pruning | Remove tokens that appear in less than 1% or more than 80% of documents |
This processing generated weighted word vectors, representing the weighted distribution of each processed token or n-gram within a given Tweet. Word vector statistics were calculated using the term frequency-inverse document frequency (TF-IDF) weighting scheme. TF-IDF emphasizes the importance of key but not uncommon terms [
To evaluate sentiment for each Tweet, the SentiWordNet 3.0 extension was used within Rapidminer. SentiWordNet is a well-established sentiment analysis protocol and has been cited by almost 1000 (988) journal publications as of the date of this writing, according to Google Scholar search. SentiWordNet assigns three sentiment scores (“positive,” “negative,” and “objective”) to each word, based on a generalized classification system developed by the authors using a combination of manual and automated sentiment scoring algorithms [
For this analysis, nouns were omitted from sentiment calculation. Recent studies have demonstrated that, for automated sentiment analyses, nouns are not likely to provide additional, reliable information [
Scores were thus assigned for each Tweet, ranging from −1 to +1, based on the estimated degree of negative or positive sentiment. These scores are reported in unstandardized form. For the purpose of statistical analysis and visualization, scores were then standardized, to produce a distribution with mean of zero (
A support vector machine (SVM) analysis was then used to identify the terms and phrases that were most commonly associated with each respective sentiment label. SVM is a computational method that derives a classification scheme based on the degree to which the various input cases (ie, word vectors) predict a given binary class (eg, positive or negative sentiment or “mentions Sasai” or null) [
Once the Twitter data were processed as described, the data were exported to Microsoft Excel for further processing using the Pivot Table function. Sentiment as well as sampled Tweet volume were aggregated and indices were calculated for all relevant sub- and cross-tables. These tables were then used to generate visualizations either directly in Microsoft Excel or using ggplot2 and ggtern in RStudio. In cases where a given table or visualization suggested a time-trend or association with respect to aggregate sentiment or Tweet volume, statistical significance was assessed using chi-squared and Tukey’s post hoc (1-way analysis of variance, ANOVA) tests. A GLM model (with Bonferroni correction) was used to test month-to-month mean difference versus previous months; 2-way ANOVA was used to compare metrics for individual entities. SPSS Statistics version 23 (IBM Corp) was used for all statistical tests.
Over the 15-month period covered, overall sentiment was found to be −0.037 on average, with a notable downward trend (
Tukey’s post hoc test for significance (1-way analysis of variance, ANOVA). Italicized values indicate significance
Year | Month | 2014 | 2015 | ||||||||||||||
Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec | Jan | Feb | Mar | |||
2014 | Jan | 0.000 | − |
− |
− |
− |
− |
− |
− |
0.015 | −0.036 | − |
− |
− |
− |
− |
|
Feb | 0.000 | −0.005 | − |
−0.021 | − |
−0.039 | 0.053 | 0.002 | −0.067 | − |
− |
− |
|||||
Mar | 0.005 | 0.000 | − |
− |
−0.016 | − |
−0.034 | 0.008 | −0.062 | − |
− |
− |
− |
||||
Apr | 0.000 | − |
− |
0.028 | −0.001 | −0.020 | − |
− |
− |
||||||||
May | 0.000 | 0.001 | 0.029 | 0.010 | −0.025 | − |
− |
||||||||||
Jun | 0.021 | 0.016 | − |
− |
0.000 | − |
−0.018 | 0.024 | −0.046 | − |
− |
− |
− |
||||
Jul | −0.001 | 0.000 | 0.028 | 0.009 | −0.026 | − |
− |
||||||||||
Aug | 0.039 | 0.034 | −0.028 | − |
0.018 | − |
0.000 | 0.041 | −0.028 | − |
− |
− |
− |
||||
Sep | −0.015 | −0.053 | − |
− |
− |
− |
− |
− |
0.000 | −0.051 | − |
− |
− |
− |
− |
||
Oct | 0.036 | −0.002 | −0.008 | − |
− |
−0.024 | − |
−0.041 | 0.051 | 0.000 | −0.070 | − |
− |
− |
− |
||
Nov | 0.067 | 0.062 | 0.001 | −0.029 | 0.046 | −0.028 | 0.028 | 0.070 | 0.000 | −0.019 | −0.054 | − |
− |
||||
Dec | 0.020 | −0.010 | −0.009 | 0.019 | 0.000 | −0.035 | − |
− |
|||||||||
2015 | Jan | 0.025 | 0.026 | 0.054 | 0.035 | 0.000 | − |
− |
|||||||||
Feb | 0.000 | ||||||||||||||||
Mar | − |
0.000 |
Tukey’s post hoc test for homogenous subsets (1-way analysis of variance, ANOVA).
Year | Month | N | Subset for alpha=.05 | ||||||||
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |||
2015 | February | 1034 | −.2648 | ||||||||
March | 187 | −.1774 | |||||||||
January | 209 | −.0768 | |||||||||
2014 | May | 558 | −.0521 | −.0521 | |||||||
July | 680 | −.0510 | −.0510 | ||||||||
December | 1092 | −.0422 | −.0422 | −.0422 | |||||||
November | 75 | −.0230 | −.0230 | −.0230 | |||||||
April | 2349 | −.0224 | −.0224 | −.0224 | |||||||
August | 395 | .0052 | .0052 | .0052 | |||||||
June | 887 | .0230 | .0230 | ||||||||
March | 691 | .0391 | .0391 | ||||||||
February | 424 | .0443 | .0443 | ||||||||
October | 136 | .0467 | .0467 | ||||||||
January | 630 | .0829 | .0829 | ||||||||
September | 120 | .0978 | |||||||||
Significance |
Both sets of trends were assessed against the timeline of events to determine whether sentiment and volume varied according to actual, real-world events. Findings are reported in terms of average sentiment, volume index (ratio of monthly or average volume), and positivity index (ratio of positive to negative volume).
Volume and average sentiment over time. Sentiment score calculated using unweighted aggregate sentiment scores found in the SentiWordNet database, for each valid token in each Tweet. For this analysis, verb, adjectives, and adverbs were considered valid for the purpose of sentiment scoring. Volume is based on number of Tweets retrieved per sampling interval. Sentiment increasingly negative over time; one key exception corresponds with the tragedy surrounding Dr Sasai (August to October 2014). Volume is driven by major events.
Month-to-month trinary sentiment or volume density chart. Density plot calculated based on the proportion of negative (N: top left), positive (P: top right), and objective or nonpolarized (O: bottom center) discussion volume (represented by the unlabeled data points). Volume density is calculated via isometric log ratio transformation.
On January 29, 2014, a letter [
For the 3 days comprising the month of January 2014 (January 29-31), average sentiment was found to be second highest among all months covered (0.083;
Average sentiment decreased significantly, but remained overall positive (0.044;
Initial concerns about possible figure manipulation first voiced on PubPeer (February 4) [
STAP coauthor, Dr Charles Vacanti, posts images claimed to be human STAP cells (February 5) [
Riken subsequently launches an investigation into “alleged irregularities” for both papers [
Average sentiment decreased slightly, but not significantly (0.039;
“Essential technical tips for STAP...” published by Ms Obokata, Dr Sasai, and Dr Niwa on March 3 [
STAP coauthor, Dr Wakayama, breaks from others and proposes retraction of both papers [
In an interim report, the Riken investigation team finds inappropriate handling of data [
Discussion volume reached its peak (372.2% index; 2349/631.1); however, discussion took on a more negative tone. Average sentiment decreased significantly (−0.022;
Riken finds Ms Obokata guilty of “two instances of research misconduct” in the STAP work [
Ms Obokata holds a press conference in order to rebut Riken’s conclusions [
Nature publishes a strongly worded editorial on science policy in Japan, citing STAP case [
Average sentiment continues the negative trend (−0.052;
Ms Obokata, under pressure to retract both papers, agrees to retract only the letter (May 28).
Other senior authors continue to negotiate with Ms Obokata regarding the remaining article [
Driven by a sharp increase in positivity (245%; 137/56) and overall discussion (140.5% index; 887/631.1), average sentiment became slightly positive (0.023;
STAP coauthor, Dr Wakayama, presents genetic evidence refuting the existence of STAP cells [
Ms Obokata and coauthors finally agree to retract both papers published in Nature [
Riken reform committee recommends restructuring of Center for Developmental Biology [
Average sentiment was once again negative (−0.051;
On July 3, the two Nature papers reporting the STAP cells are retracted [
Ms Obokata sustains injuries while being pursued by television reporters [
Concurrent with Riken’s investigation of the STAP case, Waseda University begins investigation of alleged plagiarism in Ms Obokata’s doctoral dissertation [
Discussion volume was subpar (62.6% index; 395/631.1); despite a positivity index of 115% (45/39), average sentiment was nevertheless mixed (0.005;
On August 5, Dr Sasai, a STAP coauthor, found dead at the Riken center due to apparent suicide.
Dr Sasai leaves behind a note addressed to Ms Obokata, urging her to verify existence of STAP [
A STAP coauthor, Dr Niwa, announces his lab’s failure to replicate STAP results (August 27) [
Driven by a large increase in positivity (1575%; 63/4), average sentiment improved considerably (0.098;
Vacanti et al release new STAP protocol; addition of adenosine triphosphate (ATP) now asserted to be key (September 3) [
Dr Endo publishes a report suggesting that STAP cells may have been embryonic stem cells (September 21) [
In October 2014, average sentiment decreased significantly, but remained overall positive (0.047;
In November 2014, discussion volume fell to an all-time low (11.9% index; 75/631.1). Despite above average positivity (150%; 3/2), average sentiment continued to trend negative (−0.023;
In December 2014, average sentiment decreased slightly, but not to a significant degree (−0.042;
Riken halts STAP verification experiments, announcing them to have failed [
Ms Obokata resigns from her position at Riken [
In January 2015, the downward trend in average sentiment continued—albeit not to a significant degree (−0.08;
In February 2015, a large uptick in discussion (163.8% index; 1034/631.1) drove a precipitous decline in average sentiment (−0.26;
Riken’s announcement of penalties related to the STAP research and publication.
Riken’s public announcement of plans to pursue criminal charges against Ms Obokata [
The Guardian piece, “What pushes scientists to lie,” [
In March 2015, average sentiment improves slightly but remains extremely negative (−0.18;
Riken’s decision not to sue Ms Obokata [
However, reported demands that she return publication-related expenses [
Significant differences were found with respect to the sentiment surrounding various parties. Tweet data were aggregated according to whether Ms “Obokata,” Dr “Sasai,” or (inclusive) the “Riken” institute were mentioned. Overall, sentiment surrounding Ms Obokata and the Riken institute was found to be consistent with broader trends (−0.04 and −0.03, respectively; no significant difference). However, sentiment toward Dr Sasai was found to be significantly more positive overall (0.03;
Dr Sasai initially received minimum attention. However, once allegations of misconduct began to emerge, Dr Sasai’s continued, public support of Ms Obokata becomes increasingly associated with discussion that was significantly more favorable (0.14, 0.06, and 0.14 respectively, in March 2014, April 2014, and May 2014;
These trends, however, reversed in August and September of that year. Coincident with and following the tragedy surrounding Dr Sasai, the sentiment associated with mentions of Dr Sasai became positive (0.07 and 0.07 respectively, in August 2014 and September 2014;
Sentiment comparison for key actors. Month-to-month sentiment for key figures and entities corresponds with associated timeline events. Month-to-month sentiment scores were independently aggregated for Tweets mentioning Ms Obokata, Dr Sasai, or Riken. Data labels shown where mean differences are significant versus total.
A simplified “grounded theory” approach [
Only a few studies have covered the publicity of science using SNS [
This study found that STAP-related discussion volume varied significantly month-to-month, coincident with new events. Furthermore, we found that month-to-month sentiment was generally neutral or of mixed composition, tending to skew negative when polarized. This is consistent with previous findings concerning the characteristics of public sentiment as expressed on Twitter [
In addition, this analysis found that sentiment surrounding various stakeholders differed significantly with respect to specific events. Of particular note is the sentiment surrounding Dr Sasai, the researcher whose tragic fate was found to correspond with an increase in positive sentiment. The relationship between the death of a key stakeholder in a public crisis and subsequent improvement of the public mood—from criticism to sympathy—has been covered in the Japanese literature in the 1980s [
The results presented here provide an important case study for understanding the impact of scientific misconduct on public sentiment. The coverage received by the STAP cell case can be attributed to many factors, but the instrumentality of social media cannot be ignored. Although this manuscript was undergoing review, a related study was published that provided rudimentary analysis of the print and social media coverage of the STAP cell case in Japan [
The question still remains, however, whether sentiment expressed on Twitter regarding future cases of misconduct will accurately reflect overall public sentiment. Prior research has demonstrated that the Twitter medium most effectively influences or reflects public response with respect to high-volume events or crises [
The text- and sentiment-analysis procedures employed in this study were robust and well-validated. The reported metrics are limited by the analytical processes that were used to derive them from the text. In this case, SentiWordNet was used to obtain sentiment scores. Reported sentiment scores are thus limited by the accuracy and precision of the SentiWordNet database with respective material covered. In addition, volume estimations were based on and limited by the distribution characteristics of the sample obtained from the data provider. Furthermore, the reported metrics are estimations, as is the case with all sampling-based analytical approaches. That having been said, the analytical and data retrieval methods used are well established and have been verified to be sufficiently robust for such analyses [
This study represents the first objective analysis of public response to a major case of scientific misconduct. This study observes and tracks changes in public sentiment over a 15-month sequence of events associated with the STAP cell case, which was one of the most publicized cases of major scientific misconduct in recent memory. Here, we demonstrated that public response to this particular case tended to be generally neutral or of mixed composition, particularly during times of lower public attention. This was observed in the large majority of months covered in this study. Also observed was that sentiment tended to skew negative as discussion polarity and volume increased. These findings are generally consistent with those observed in the literature with respect to major events across a wide range of topics, including entertainment, sports, business, politics, and even natural disasters. These findings support the notion that changes in public sentiment toward any major event—cases of scientific misconduct included—might be as much a function of the attention received as it is a function of the theme or merits of any specific case. As the saying goes, “no news is good news”—and this study demonstrates this quite clearly. Once the STAP story becomes tainted by allegations of misconduct, increases in public attention—driven mostly by the public relations (PR) efforts of the respective actors—consistently corresponded with increases in overall negativity. The only event that broke this trend was one of the only events not staged for publicity—the apparent suicide of a key stakeholder. Here, we observed a clear and significant positive shift in overall sentiment; however, this was also accompanied by a notable subsequent decrease in volume. Overall, these results strongly suggest that, in cases of research misconduct, public opinion—and by extension, public policy—is likely to be more influenced by negative-leaning news and reporting. Academic researchers, policy makers, and those with associated interests are advised to carefully consider the implications.
A summary timeline.
analysis of variance
adenosine triphosphate
British Broadcasting Corporation
Cable News Network
honest significant difference
public relations
social network services
stimulus-triggered acquisition of pluripotency
support vector machine
term frequency-inverse document frequency
None declared.