Published on in Vol 6, No 3 (2020): Jul-Sep

Preprints (earlier versions) of this paper are available at, first published .
Investigating the Attitudes of Adolescents and Young Adults Towards JUUL: Computational Study Using Twitter Data

Investigating the Attitudes of Adolescents and Young Adults Towards JUUL: Computational Study Using Twitter Data

Investigating the Attitudes of Adolescents and Young Adults Towards JUUL: Computational Study Using Twitter Data

Original Paper

1Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, United States

2Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States

3University Information Technology Infrastructure and Operations, University of Utah, Salt Lake City, UT, United States

4Department of Family Medicine & Public Health, University of California, San Diego, La Jolla, CA, United States

Corresponding Author:

Ryzen Benson

Department of Biomedical Informatics

University of Utah

421 Wakara Way


Salt Lake City, UT, 84108

United States

Phone: 1 (801) 581 4080


Background: Increases in electronic nicotine delivery system (ENDS) use among high school students from 2017 to 2019 appear to be associated with the increasing popularity of the ENDS device JUUL.

Objective: We employed a content analysis approach in conjunction with natural language processing methods using Twitter data to understand salient themes regarding JUUL use on Twitter, sentiment towards JUUL, and underage JUUL use.

Methods: Between July 2018 and August 2019, 11,556 unique tweets containing a JUUL-related keyword were collected. We manually annotated 4000 tweets for JUUL-related themes of use and sentiment. We used 3 machine learning algorithms to classify positive and negative JUUL sentiments as well as underage JUUL mentions.

Results: Of the annotated tweets, 78.80% (3152/4000) contained a specific mention of JUUL. Only 1.43% (45/3152) of tweets mentioned using JUUL as a method of smoking cessation, and only 6.85% (216/3152) of tweets mentioned the potential health effects of JUUL use. Of the machine learning methods used, the random forest classifier was the best performing algorithm among all 3 classification tasks (ie, positive sentiment, negative sentiment, and underage JUUL mentions).

Conclusions: Our findings suggest that a vast majority of Twitter users are not using JUUL to aid in smoking cessation nor do they mention the potential health benefits or detriments of JUUL use. Using machine learning algorithms to identify tweets containing underage JUUL mentions can support the timely surveillance of JUUL habits and opinions, further assisting youth-targeted public health intervention strategies.

JMIR Public Health Surveill 2020;6(3):e19975




Although the overall use of any tobacco product among high school students decreased from 24.2% in 2011 to 19.6% in 2017 [1], overall use increased to 27.1% in 2018 [2] and further to 31.2% in 2019. This increase was primarily influenced by the use of electronic nicotine delivery systems (ENDS). Current use of ENDS among high school students increased from approximately 1.5% in 2011 [1] to approximately 27.5% in 2019 [3]. This rise in ENDS usage appears to be associated with the increasing popularity of the brand JUUL, a compact pod mod device with a disposable or refillable pod typically containing artificial flavors, nicotine salts, and either vegetable glycerin or propylene glycol and whose sales represented 76% of the ENDS market at the end of 2018 [4].

JUUL's popularity stems from 3 main features of the product: appearance, flavors, and nicotine delivery [5,6]. JUUL's sleek “USB-like” design has assisted in the normalization of public ENDS usage and serves to facilitate inconspicuous use in smoking-prohibited areas such as schools and other public places [7]. JUUL was previously available in a variety of youth-appealing flavors, including but not limited to mango, mint, Crème brûlée, and menthol [8]. As of October 2019, JUUL Labs had removed all flavors except for the classic tobacco, Virginia tobacco, and menthol flavors in an attempt to address concerns regarding the appeal of the product to underage users [9].

Where the nicotine concentrations of combustible tobacco products range from 1.5% to 2.5% by weight [10,11], nicotine concentrations in JUUL pods range from 3% (35 mg/mL) to 5% (59 mg/mL) by weight. Although JUUL pods contain a fraction of the total nicotine that a pack of cigarettes does, JUUL users absorb roughly the same amount of nicotine in a single pod as a pack of cigarettes [12]. This suggests that nicotine is being absorbed more efficiently through JUUL pods than through combustible cigarettes — likely a result of cigarette nicotine being combusted into sidestream smoke and JUUL pods’ nicotinic formulation [13]. JUUL pods contain a protonated form of nicotine known as nicotine salts [14], of which the absorption resembles freebase nicotine seen in cigarettes [15,16] but has a smoother feel when inhaled and does not taste as bitter [13,17].

A recent study on youth awareness of JUUL’s nicotine strength demonstrated that 37.4% of adolescents believed JUUL to contain low or medium nicotine strength and 31.4% were unaware of the nicotine strength [18]. These findings suggest that adolescents are unaware of the relatively high nicotine content in a single JUUL pod. Additional research has documented the emergence of JUUL-compatible pods, some containing nicotine concentrations as high as 6.5% [13]. With approximately 90% of adult daily ever smokers beginning before 18 years of age [19] and a lack of public understanding regarding JUUL’s highly concentrated nicotine levels [20], it has been hypothesized that JUUL poses a risk to younger populations for developing nicotine dependency [21,22]. Consequently, nicotine dependency developed in adolescence may result in addiction and potentially a later transition to traditional combustible cigarettes [23]. With the ENDS market rapidly changing in terms of products and patterns of use (ie, pod mods, box mods, vape pens), there are crucial knowledge gaps in understanding underage ENDS use and its consequences [24].

Studies of JUUL Use Using Social Media

Free and publicly available data obtained from Twitter can provide insight into public perceptions and knowledge of health behaviors. As reported in 2018 and 2019 Pew Research Center surveys, 32% of teenagers between the ages of 13 and 17 years [25] and 44% of adults between the ages of 18 and 24 years [26] use Twitter. Given this age distribution, the platform serves as a promising source of data for understanding adolescent and young adult JUUL use. Previous studies that have utilized Twitter data on JUUL have identified a number of experiences and insights into the product and its users such as the use of JUUL in prohibited environments (eg, schools) [27], the acquisition of JUUL devices and JUUL pods [28], and the correlation between JUUL mentions on Twitter and JUUL sales [29]. In addition to these studies, there is a growing body of work assessing how JUUL is promoted and used by underage individuals on various social media platforms. Not only does the literature suggest a heavy presence of youth JUUL-related content [30], but younger users are also sharing their opinions and experiences with other users and are talking about the various aspects associated with JUUL use [31-33]. However, a large-scale analysis of JUUL-related tweets that utilizes computational methods has, to the best of our knowledge, not been conducted to understand underage patterns of use and perceptions towards JUUL. Using machine learning algorithms to classify tweets allows for the automatic categorization of tweets and eliminates the time-consuming and resource-consuming burden that comes with the labor-intensive manual annotation process. While the application of machine learning to tweets has shown promise in several public health subdisciplines [34,35], these methods are greatly underutilized in ENDS research.


Our primary objective was to further understand salient themes and topics related to JUUL use on Twitter with particular foci on underage JUUL use and health perceptions. Our secondary objective was to use natural language processing (NLP) methods to develop machine learning–based classifiers capable of automatically identifying and evaluating underage-related JUUL mentions as well as positive and negative sentiments towards JUUL. In doing so, we hoped to provide optimally performing classifiers to be further validated and applied to additional work relating to underage JUUL use and its representation on Twitter.

Data Collection

Using the free Twitter application programming interface (API) [36], we collected a sample of 28,590 tweets from July 2018 to August 2019. To query the Twitter API, appropriate JUUL-related keywords were determined with the aid of a tobacco control researcher (SZ). We used the case-insensitive keywords JUUL, Phix, Sourin, myblu, Aspire Breeze, vaping pod, pod mod, and vape pod, as these terms are all common to pod mod ENDS devices. As we were primarily interested in the organic perspective of individuals regarding JUUL use, we removed all retweets from the dataset. After retweet removal, our dataset was comprised of 11,556 unique English language tweets.

Ethical Considerations

This study was determined to be exempt from review by the University of Utah Institutional Review Board (IRB#00076188). To protect user privacy, we refrained from including usernames in this paper. Further, all quotations used are synthesized from multiple examples.

Manual Twitter Content Analysis

To analyze the various themes of our collected tweets, we carried out a manual annotation process in which we categorized each tweet according to its content. We used the classification scheme developed by Myslin et al [34] for emerging tobacco product Twitter surveillance as a starting point, modifying the classification categories to more appropriately reflect our scope of interest in JUUL. We initially included 39 categories to code for tweet relevancy (ie, whether the tweet was JUUL-related), type, content, and sentiment. At this point, an initial annotation coding round was carried out on 200 tweets to determine the interrater agreement between 2 annotators (RB and MC) and refine the annotation scheme. With consensus among annotators, categories deemed extraneous and irrelevant to our analysis of JUUL (eg, hookah) were excluded from the annotation scheme. Additionally, categories deemed too specific were consolidated with closely related categories. For instance, the separate categories “Industry” and “Policy” were combined to form a singular “Industry and Regulation” category. The final annotation scheme was comprised of 22 categories related to themes of JUUL use, its perceptions among users, and an “Unrelated” category. Our final annotation scheme is available in Multimedia Appendix 1, and synthetic examples of these annotation categories are presented in Figure 1. In an attempt to limit our analysis to JUUL use exclusively, tweets that contained keywords other than JUUL were annotated as “Unrelated” unless the tweet also contained the keyword “JUUL.” Further, we restricted the underage label to those tweets that contained explicit contextual evidence regarding underage elements (eg, “My parents still don’t know I JUUL at school,” “FDA warns of JUUL use in high school,” “For my 16th birthday, I want mango JUUL pods”).

Figure 1. Final categories and synthetic tweet examples, as seen in the manual annotation.
View this figure

Once the interrater agreement exceeded an acceptable Cohen kappa level [37] (ie, >0.7 [38]), the remaining manual annotation process was carried out by one annotator (RB). Excluding the tweets used for interrater agreement, a total of 4000 tweets were annotated during the manual annotation to ensure there was a sufficient number of tweets for training the machine learning classifiers.

Data Preprocessing

Using the Natural Language Toolkit (NLTK) [39] – a widely used Python toolkit for analyzing text data – our manually annotated tweets were tokenized using the TweetTokenizer tool. This tool splits characters into individual tokens while also removing punctuation, @ characters, and other extraneous characters. TweetTokenizer is also capable of handling and tokenizing emojis and emoticons. Since these characters are often used in modern text when conveying emotion and sentiment, they are imperative in understanding tweet content. Consequently, we retained emojis and emoticons in the tweets, and they were tokenized as if they were words themselves.

All tokens were then converted into n-gram text sequences. An n-gram (ie, unigram, bigram, trigram) is a contiguous sequence of n features used in NLP to transform raw text into features that can be readily processed by a machine learning algorithm (Figure 2).

Figure 2. Visualization of n-grams. n-grams can be described as a sequence of n-items, can encode additional semantic content beyond individual words, and once vectorized, can be used as features in machine learning algorithms.
View this figure

Machine Learning Classification

In an attempt to automatically classify JUUL related tweets, we applied supervised machine learning algorithms to identify tweets related to underage JUUL use, positive sentiment, and negative sentiment. The goal of this machine learning–based approach was to identify a predictive function of the data in which unseen data can be accurately classified as containing either underage JUUL use, positive sentiment, or negative sentiment. The efficient and automatic classification of JUUL-related tweets provides a snapshot into the perceptions and use patterns of JUUL and the potential to scale up the analysis beyond what can be realistically performed by manual annotation alone. The algorithms we used for classification were a logistic regression, Bernoulli naïve Bayes, and random forest classifier. Descriptions of the 3 classification algorithms are available in Figure 3.

These models were selected because of their computational simplicity and efficiency in Twitter-based classification tasks [34,40-42]. The input of each classifier consisted of the most salient features determined by feature selection (ie, a process in which the essential terms for model performance are identified automatically, with the rest being discarded).

This feature selection was carried out using Sci-Kit Learn (sklearn) [43], another Python toolkit that is frequently used for text analysis. The tool SelectKBest was used to compare chi-square statistics for each feature and retain the most discerning features of the dataset. In addition to reducing the chance of overfitting the models, feature selection improves model performance due to the removal of features deemed irrelevant. Once a range of suitable features had been selected, the hyperparameters for each algorithm were optimized. This hyperparameter optimization was carried out with sklearn’s GridSearchCV tool, which iterates through specified model parameters and determines the optimally performing model using 10-fold cross-validation. Finally, we applied the optimally performing model to the remaining unannotated tweets.

The following 4 metrics were used to evaluate the performance of the various models: accuracy, precision (positive predictive value), recall (sensitivity), and F1 score (the harmonic mean of precision and recall). These metrics are standard in NLP and reflect a classifier’s ability to classify the task at hand effectively [44,45]. Our goal was to develop classifiers capable of performing well across all 4 metrics, and all 4 metrics were considered when evaluating overall performance.

Figure 3. Brief descriptions of the 3 machine learning algorithms used to classify our annotated tweets.
View this figure

Manual Twitter Content Analysis

Of the 4000 tweets analyzed during the annotation process, 3152 (78.80%) were relevant to JUUL and explicitly mentioned JUUL or JUUL-related accessories such as JUUL pods and chargers. Of the relevant tweets, the most prevalent category was first person usage or experience (1792/3152, 56.85%). The least prevalent categories were using JUUL as a cessation method (45/3152, 1.43%) and using JUUL for the first time (38/3152, 1.21%). Overall sentiment towards JUUL was more positive (1052/3152, 33.38%) than negative (683/3152, 21.67%), and 1416 tweets (1416/3152, 44.92%) demonstrated neutral sentiment. When excluding news, media, and marketing tweets, positive sentiment towards JUUL slightly increased to 33.91% (941/2775) compared to 19.14% (531/2775) for negative sentiment. Lastly, 216 tweets (216/3152, 6.85%) mentioned potential health benefits or detriments of JUUL usage, and 586 tweets (586/3152, 18.59%) mentioned JUUL pods or flavors. See Table 1 for the proportions and frequencies obtained in the manual annotation.

Table 1. Category proportions and frequencies from the manual annotation of tweets (n=3152).
CategoryaProportion, %Frequency

First-person experience56.851792
Neutral sentiment44.921416
Positive sentiment33.381052
Negative sentiment21.67683
Flavor/JUUL pods18.59586
Other substances9.55301
Experience: other7.99252
Health effects6.85216

aCategories are not mutually exclusive.

Machine Learning Classification of Underage JUUL Mentions and Sentiment

Using supervised machine learning algorithms, we created models to classify underage JUUL mentions and sentiment towards JUUL among Twitter users. To evaluate the different models, we compared the test metrics for all 3 algorithms using the 500 most relevant features for each model (Table 2). In all 3 classification tasks, the random forest model outperformed the logistic regression and Bernoulli naïve Bayes models. When classifying tweets related to underage usage of JUUL, the random forest model yielded a higher accuracy (99% accuracy) when compared to the logistic regression model (94% accuracy) and substantially higher accuracy than the Bernoulli naïve Bayes model (78% accuracy; Figure 4). When comparing the models’ performance for classifying positive and negative tweet sentiment, the random forest model performed considerably better (82% and 91% accuracy, respectively) than the logistic regression model (72% and 78% accuracy, respectively) and the Bernoulli naïve Bayes model (69% and 62% accuracy, respectively). When applying our random forest classifier to additional unseen data (7356 unannotated tweets), our model classified 109 of 7356 tweets as underage-related (1.48%). This proportion is lower than that of the tweets classified as underage-related during the manual annotation process (190/3152, 6.03%), perhaps due to the presence of previously unseen terms related to underage JUUL use.

Table 2. Test metrics of the 3 algorithms for all 3 classification tasks as well as average model performance at 500 features for each classification task.
Test metrics and performanceLogistic regressionBernoulli naïve BayesRandom forest
Underage JUUL use0.940.940.950.920.780.710.990.570.990.990.990.99
Positive sentiment0.720.690.820.690.690.630.830.530.820.820.800.75
Negative sentiment0.780.770.850.730.720.660.980.500.910.910.900.94
Average model performance0.810.800.870.780.730.670.930.530.910.910.900.89

aAcc: accuracy

bPrec: precision

cRec: recall

Figure 4. Line plot of model performance at 500 features in classifying underage tweets and the top 10 most discerning features of the underage tweets.
View this figure

Principal Findings

In addition to supporting previous JUUL research using Twitter [27-29], our findings identified critical factors in the understanding and usage of JUUL among Twitter users. In our study, only 1.43% (45/3152) of annotated tweets mentioned using JUUL as a method of smoking cessation. This finding seems incongruent with JUUL’s stated mission of improving the lives of smokers by eliminating combustible cigarette use and replacing it with the — purportedly less harmful — JUUL product [46]. This observation is also inconsistent with the results of a 2019 survey reporting that around 20% of individuals aged 18-24 years initiated JUUL use in an attempt to quit combustible tobacco [47]. Additional research has suggested that youth not only appear to be experimenting with JUUL but are also habitually using the device [48]. Such results, in addition to our findings, suggest that Twitter may be seen as a method of obtaining information to facilitate JUUL use and procurement among youth.

Additionally, only 6.85% (216/3152) of our annotated tweets mention the potential health benefits or detriments of using JUUL, a result consistent with that found by Morean et al [18] and poses the question of whether JUUL users recognize the known effects of high-level nicotine exposure and the potential for developing nicotine dependency and subsequent nicotine addiction. While the long-term effects of JUUL use are yet to be ascertained, there is evidence to support the view that adolescent nicotine exposure may play a significant role in the detrimental alteration of neurochemical, structural, cognitive, and behavioral processes [49].

After removing underage tweets that contained news and media related content, 47% (56/118) of the remaining underage tweets mentioned first-person experiences with JUUL, with 21% (12/56) of those tweets mentioning JUUL pods and flavors — findings consistent with previous literature [28]. Moreover, of those underage first-person mentions, 32% (18/56) contained positive sentiment (eg, “I love my JUUL so much”), compared to 23% (13/56) containing negative sentiment (eg, “Juul is so disgusting”) — a finding that we expected due to the popularity of the pod mod device among youth as compared to other ENDS devices [50].

Although a majority of the tweets that we annotated contained a neutral sentiment towards JUUL (1416/3152, 44.92%), overall tweets contained a more positive sentiment (1052/3152, 33.37%) than negative sentiment (683/3152, 21.67%). And with nearly 20% (586/3152, 18.59%) of the JUUL-related tweets mentioning JUUL pods or flavors, Twitter appears to be regularly used for sharing opinions on various JUUL accessories such as pods or flavors as well as a means to gather information regarding the procurement of such accessories. At face value, it appears that Twitter may be used by individuals to share information about JUUL, thus facilitating its use; additional qualitative research would be necessary to understand the level of exposure of individuals to this content. This finding also suggests the potential for educational campaigns employing Twitter to inform the public about JUUL use, as noted in prior work [16].

Of all the machine learning models we developed, our random forest model performed best in all 3 classification tasks. The performance of the random forest can be primarily attributed to the nature of the algorithm itself. Because a random forest is an ensemble of decision trees containing random subsets of the input features, this algorithm is resilient to outlier data, and the final classification is based on the “majority vote” of the constituent decision trees [51]. Additionally, the random forest’s relatively easy implementation and computational simplicity make it a viable candidate for tobacco control researchers to use in Twitter-based ENDS surveillance.


Our work has some limitations to be considered. First, our data were obtained via the free 1% Twitter API using keyword search rather than the entire Twitter “firehose” dataset; therefore, there is the possibility that not all JUUL-related tweets in the study period were collected. Additionally, our list of keywords (JUUL, Phix, Sourin, myblu, Aspire Breeze, vaping pod, pod mod, and vape pod) is not exhaustive and does not include all pod mod devices available in the United States. We also cannot assume that Twitter users nor their tweets are entirely representative of the general population regarding personal health behaviors.

Second, the frequency of some annotation categories is relatively low, and our models may risk overfitting. In machine learning, overfitting can be described as a model that accurately recognizes patterns and performs well on the training data, but performance decreases when applied to previously unseen data [52]. For instance, our algorithms may fit the data that it was trained on, but if presented with data it has never seen before, it may not be able to maintain this accuracy as the algorithm cannot recognize patterns in the new data.

Additionally, the interpretation of tweet content during the manual annotation process is often subjective due to the brevity of tweet content, lack of grammatical structure, and usage of hyperbole, idioms, and so on. With manual annotation being an inherently interpretive task, we attempted to retain the consistency among our annotations by calculating interrater agreement between annotators, while also focusing on explicit contextual language when assigning labels to tweets.

Finally, the results of this study are preliminary, and in order to derive policy implications from our work, these classification algorithms should be further studied and validated using additional unseen data. Future work should look to apply these classifiers on unlabeled data, conduct error analysis, and refine the algorithms as needed. Pending further validation, these classifiers can be used to automatically categorize large quantities of tweets, allowing researchers to further understand how JUUL is disseminated among youth populations and propose policy change to combat underage ENDS use.


Our analysis provides a snapshot of the representation of JUUL on Twitter and brings forth several interesting observations for future research endeavors. Our work suggests that the majority of JUUL users on Twitter do not use JUUL as a method of smoking cessation. Additionally, there is a paucity of tweets in which users talk about the potential health effects of using JUUL. Using this manually annotated corpus as training data, we developed 3 supervised machine learning models to accurately classify tweets related to underage JUUL use as well as sentiment towards JUUL. Of the 3 models, our random forest classifier most accurately predicted underage JUUL-related tweets and their sentiment. The application of this algorithm is a novel analytic approach to understanding underage JUUL use on Twitter and, with further research and validation, can promote future research on underage JUUL use patterns as manifested on Twitter.


The research reported in this publication was partially supported by the National Institute on Drug Abuse of the National Institutes of Health under award number R21DA043775. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Conflicts of Interest

None declared.

Multimedia Appendix 1

JUUL-related tweet Annotation Scheme.

PDF File (Adobe PDF File), 168 KB

  1. Wang TW, Gentzke A, Sharapova S, Cullen KA, Ambrose BK, Jamal A. Tobacco Product Use Among Middle and High School Students - United States, 2011-2017. MMWR Morb Mortal Wkly Rep 2018 Jun 08;67(22):629-633 [FREE Full text] [CrossRef] [Medline]
  2. Cullen KA, Ambrose BK, Gentzke AS, Apelberg BJ, Jamal A, King BA. Notes from the Field: Use of Electronic Cigarettes and Any Tobacco Product Among Middle and High School Students - United States, 2011-2018. MMWR Morb Mortal Wkly Rep 2018 Nov 16;67(45):1276-1277 [FREE Full text] [CrossRef] [Medline]
  3. Cullen KA, Gentzke AS, Sawdey MD, Chang JT, Anic GM, Wang TW, et al. e-Cigarette Use Among Youth in the United States, 2019. JAMA 2019 Nov 05;322(21):2095 [FREE Full text] [CrossRef] [Medline]
  4. Craver R. Juul ends 2018 with 76 percent market share Internet. Winston-Salem Journal. 2019 Jan 08.   URL: https:/​/www.​​business/​juul-ends-with-percent-market-share/​article_6f50f427-19ec-50be-8b0c-d3df18d08759.​html [accessed 2019-10-23]
  5. McKelvey K, Baiocchi M, Halpern-Felsher B. Adolescents' and Young Adults' Use and Perceptions of Pod-Based Electronic Cigarettes. JAMA Netw Open 2018 Oct 05;1(6):e183535 [FREE Full text] [CrossRef] [Medline]
  6. Strongin RM. E-Cigarette Chemistry and Analytical Detection. Annu Rev Anal Chem (Palo Alto Calif) 2019 Jun 12;12(1):23-39 [FREE Full text] [CrossRef] [Medline]
  7. Walley SC, Wilson KM, Winickoff JP, Groner J. A Public Health Crisis: Electronic Cigarettes, Vape, and JUUL. Pediatrics 2019 Jun 23;143(6):e20182741 [FREE Full text] [CrossRef] [Medline]
  8. Leventhal AM, Miech R, Barrington-Trimis J, Johnston LD, O'Malley PM, Patrick ME. Flavors of e-Cigarettes Used by Youths in the United States. JAMA 2019 Nov 05 [FREE Full text] [CrossRef] [Medline]
  9. Juul Suspends Sales of Flavored Vapes And Signs Settlement To Stop Marketing To Youth Internet. NPR. 2019 Oct 17.   URL: https:/​/www.​​sections/​health-shots/​2019/​10/​17/​771098368/​juul-suspends-sales-of-flavored-vapes-and-signs-settlement-to-stop-marketing-to- [accessed 2020-08-21]
  10. Benowitz NL, Henningfield JE. Establishing a nicotine threshold for addiction. The implications for tobacco regulation. N Engl J Med 1994 Jul 14;331(2):123-125. [CrossRef] [Medline]
  11. Taghavi S, Khashyarmanesh Z, Moalemzadeh-Haghighi H, Nassirli H, Eshraghi P, Jalali N, et al. Nicotine content of domestic cigarettes, imported cigarettes and pipe tobacco in iran. Addict Health 2012;4(1-2):28-35 [FREE Full text] [Medline]
  12. 6 important facts about JUUL. Truth Initiative. 2018 Aug 20.   URL: https:/​/truthinitiative.​org/​research-resources/​emerging-tobacco-products/​6-important-facts-about-juul [accessed 2020-08-21]
  13. Jackler RK, Ramamurthi D. Nicotine arms race: JUUL and the high-nicotine product market. Tob Control 2019 Nov;28(6):623-628. [CrossRef] [Medline]
  14. Shao XM, Friedman TC. Pod-mod vs. conventional e-cigarettes: nicotine chemistry, pH, and health effects. J Appl Physiol (1985) 2020 Apr 01;128(4):1056-1058. [CrossRef] [Medline]
  15. Henningfield J, Pankow J, Garrett B. Ammonia and other chemical base tobacco additives and cigarette nicotine delivery: issues and research needs. Nicotine Tob Res 2004 Apr;6(2):199-205. [CrossRef] [Medline]
  16. O'Connell G, Pritchard JD, Prue C, Thompson J, Verron T, Graff D, et al. A randomised, open-label, cross-over clinical study to evaluate the pharmacokinetic profiles of cigarettes and e-cigarettes with nicotine salt formulations in US adult smokers. Intern Emerg Med 2019 Sep 2;14(6):853-861 [FREE Full text] [CrossRef] [Medline]
  17. Kimbrough D. Vaping: What You Need to Know. American Chemical Society. 2019 Dec.   URL: https:/​/www.​​content/​acs/​en/​education/​resources/​highschool/​chemmatters/​past-issues/​2019-2020/​dec-2019/​vaping.​html [accessed 2020-08-21]
  18. Morean ME, Bold KW, Kong G, Gueorguieva R, Camenga DR, Simon P, et al. Adolescents' awareness of the nicotine strength and e-cigarette status of JUUL e-cigarettes. Drug Alcohol Depend 2019 Nov 01;204:107512. [CrossRef] [Medline]
  19. U.S. Department of Health and Human Services. Preventing Tobacco Use Among Youth and Young Adults: A Report of the Surgeon General. Atlanta, GA: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Office on Smoking and Health; 2012.
  20. Willett JG, Bennett M, Hair EC, Xiao H, Greenberg MS, Harvey E, et al. Recognition, use and perceptions of JUUL among youth and young adults. Tob Control 2019 Jan 18;28(1):115-116. [CrossRef] [Medline]
  21. Dobbs PD, Hodges EJ, Dunlap CM, Cheney MK. Addiction vs. dependence: A mixed methods analysis of young adult JUUL users. Addict Behav 2020 Aug;107:106402. [CrossRef] [Medline]
  22. Case KR, Hinds JT, Creamer MR, Loukas A, Perry CL. Who is JUULing and Why? An Examination of Young Adult Electronic Nicotine Delivery Systems Users. J Adolesc Health 2020 Jan;66(1):48-55. [CrossRef] [Medline]
  23. Soneji S, Barrington-Trimis JL, Wills TA, Leventhal AM, Unger JB, Gibson LA, et al. Association Between Initial Use of e-Cigarettes and Subsequent Cigarette Smoking Among Adolescents and Young Adults: A Systematic Review and Meta-analysis. JAMA Pediatr 2017 Aug 01;171(8):788-797 [FREE Full text] [CrossRef] [Medline]
  24. Murthy VH. E-Cigarette Use Among Youth and Young Adults: A Major Public Health Concern. JAMA Pediatr 2017 Mar 01;171(3):209-210. [CrossRef] [Medline]
  25. Anderson M, Jiang J. Teens, Social Media & Technology 2018. Pew Research Center Internet & Technology. 2018 May 31.   URL: [accessed 2020-08-21]
  26. Perrin A, Anderson M. Share of U.S. adults using social media, including Facebook, is mostly unchanged since 2018. Pew Research Center. 2019 Apr 10.   URL: https:/​/www.​​fact-tank/​2019/​04/​10/​share-of-u-s-adults-using-social-media-including-facebook-is-mostly-unchanged-since-2018/​ [accessed 2019-10-23]
  27. Kavuluru R, Han S, Hahn EJ. On the popularity of the USB flash drive-shaped electronic cigarette Juul. Tob Control 2019 Jan 13;28(1):110-112 [FREE Full text] [CrossRef] [Medline]
  28. Allem J, Dharmapuri L, Unger JB, Cruz TB. Characterizing JUUL-related posts on Twitter. Drug Alcohol Depend 2018 Sep 01;190:1-5 [FREE Full text] [CrossRef] [Medline]
  29. Huang J, Duan Z, Kwok J, Binns S, Vera LE, Kim Y, et al. Vaping versus JUULing: how the extraordinary growth and marketing of JUUL transformed the US retail e-cigarette market. Tob Control 2019 Mar 31;28(2):146-151 [FREE Full text] [CrossRef] [Medline]
  30. Czaplicki L, Kostygina G, Kim Y, Perks SN, Szczypka G, Emery SL, et al. Characterising JUUL-related posts on Instagram. Tob Control 2019 Jul 02. [CrossRef] [Medline]
  31. Brett EI, Stevens EM, Wagener TL, Leavens EL, Morgan TL, Cotton WD, et al. A content analysis of JUUL discussions on social media: Using Reddit to understand patterns and perceptions of JUUL use. Drug Alcohol Depend 2019 Jan 01;194:358-362. [CrossRef] [Medline]
  32. Chu K, Colditz JB, Primack BA, Shensa A, Allem J, Miller E, et al. JUUL: Spreading Online and Offline. J Adolesc Health 2018 Nov;63(5):582-586 [FREE Full text] [CrossRef] [Medline]
  33. Malik A, Li Y, Karbasian H, Hamari J, Johri A. Live, Love, Juul: User and Content Analysis of Twitter Posts about Juul. Am J Health Behav 2019 Mar 01;43(2):326-336. [CrossRef] [Medline]
  34. Myslín M, Zhu S, Chapman W, Conway M. Using twitter to examine smoking behavior and perceptions of emerging tobacco products. J Med Internet Res 2013 Aug 29;15(8):e174 [FREE Full text] [CrossRef] [Medline]
  35. Alvaro N, Conway M, Doan S, Lofi C, Overington J, Collier N. Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use. J Biomed Inform 2015 Dec;58:280-287 [FREE Full text] [CrossRef] [Medline]
  36. Developers. Twitter Developer.   URL: [accessed 2020-08-21]
  37. Cohen J. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 2016 Jul 02;20(1):37-46. [CrossRef]
  38. McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb) 2012;22(3):276-282 [FREE Full text] [Medline]
  39. Bird S, Klein E, Loper E. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. Sebastopol, CA: O'Reilly Media, Inc; Jun 12, 2009.
  40. Aphinyanaphongs Y, Ray B, Statnikov A, Krebs P. Text classification for automatic detection of alcohol use-related tweets: A feasibility study. 2014 Presented at: 2014 IEEE 15th International Conference on Information Reuse Integration; August 13-15, 2014; Redwood City, CA. [CrossRef]
  41. Xu B, Guo X, Ye Y, Cheng J. An Improved Random Forest Classifier for Text Categorization. JCP 2012 Dec 01;7(12) [FREE Full text]
  42. McCallum A, Nigam K. A comparison of event models for Naive Bayes text classification. 1998 Presented at: AAAI-98 Workshop on Learning for Text Categorization; July 26-27, 1998; Madison, WI.
  43. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 2011;12:2825-2830 [FREE Full text] [CrossRef]
  44. Davis J, Goadrich M. The relationship between Precision-Recall and ROC curves. New York, NY: Association for Computing Machinery; 2006 Presented at: ICML '06: 23rd International Conference on Machine Learning; June 25-29, 2006; Pittsburgh, PA. [CrossRef]
  45. Goutte C, Gaussier E. In: Losada DE, Fernández-Luna JM, editors. A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation. Berlin, Heidelberg: Springer; 2005:345-359.
  46. About JUUL.   URL: [accessed 2019-11-05]
  47. Patel M, Cuccia A, Willett J, Zhou Y, Kierstead EC, Czaplicki L, et al. JUUL use and reasons for initiation among adult tobacco users. Tob Control 2019 Nov 19;28(6):681-684. [CrossRef] [Medline]
  48. Vallone DM, Bennett M, Xiao H, Pitzer L, Hair EC. Prevalence and correlates of JUUL use among a national sample of youth and young adults. Tob Control 2019 Nov 29;28(6):603-609. [CrossRef] [Medline]
  49. Yuan M, Cross SJ, Loughlin SE, Leslie FM. Nicotine and the adolescent brain. J Physiol 2015 Jun 23;593(16):3397-3412. [CrossRef]
  50. Krishnan-Sarin S, Jackson A, Morean M, Kong G, Bold KW, Camenga DR, et al. E-cigarette devices used by high-school youth. Drug Alcohol Depend 2019 Jan 01;194:395-400 [FREE Full text] [CrossRef] [Medline]
  51. Ali J, Khan R, Ahmad N, Maqsood I. Random Forests and Decision Trees. International Journal of Computer Science Issues 2012;9(5) [FREE Full text]
  52. Dietterich T. Overfitting and undercomputing in machine learning. ACM Comput. Surv 1995 Sep;27(3):326-327. [CrossRef]

Acc: accuracy
API: application programming interface
ENDS: electronic nicotine delivery systems
NLP: natural language processing
Prec: precision
Rec: recall

Edited by G Eysenbach; submitted 07.05.20; peer-reviewed by G Nicol, K McCausland; comments to author 22.06.20; revised version received 17.07.20; accepted 10.08.20; published 02.09.20


©Ryzen Benson, Mengke Hu, Annie T Chen, Subhadeep Nag, Shu-Hong Zhu, Mike Conway. Originally published in JMIR Public Health and Surveillance (, 02.09.2020.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.