How Do You #relax When You’re #stressed? A Content Analysis and Infodemiology Study of Stress-Related Tweets

Background Stress is a contributing factor to many major health problems in the United States, such as heart disease, depression, and autoimmune diseases. Relaxation is often recommended in mental health treatment as a frontline strategy to reduce stress, thereby improving health conditions. Twitter is a microblog platform that allows users to post their own personal messages (tweets), including their expressions about feelings and actions related to stress and stress management (eg, relaxing). While Twitter is increasingly used as a source of data for understanding mental health from a population perspective, the specific issue of stress—as manifested on Twitter—has not yet been the focus of any systematic study. Objective The objective of our study was to understand how people express their feelings of stress and relaxation through Twitter messages. In addition, we aimed at investigating automated natural language processing methods to (1) classify stress versus nonstress and relaxation versus nonrelaxation tweets, and (2) identify first-hand experience—that is, who is the experiencer—in stress and relaxation tweets. Methods We first performed a qualitative content analysis of 1326 and 781 tweets containing the keywords “stress” and “relax,” respectively. We then investigated the use of machine learning algorithms—in particular naive Bayes and support vector machines—to automatically classify tweets as stress versus nonstress and relaxation versus nonrelaxation. Finally, we applied these classifiers to sample datasets drawn from 4 cities in the United States (Los Angeles, New York, San Diego, and San Francisco) obtained from Twitter’s streaming application programming interface, with the goal of evaluating the extent of any correlation between our automatic classification of tweets and results from public stress surveys. Results Content analysis showed that the most frequent topic of stress tweets was education, followed by work and social relationships. The most frequent topic of relaxation tweets was rest & vacation, followed by nature and water. When we applied the classifiers to the cities dataset, the proportion of stress tweets in New York and San Diego was substantially higher than that in Los Angeles and San Francisco. In addition, we found that characteristic expressions of stress and relaxation varied for each city based on its geolocation. Conclusions This content analysis and infodemiology study revealed that Twitter, when used in conjunction with natural language processing techniques, is a useful data source for understanding stress and stress management strategies, and can potentially supplement infrequently collected survey-based stress data.


Introduction
Psychological stress has been linked to multiple health conditions, including depression [1], heart disease [2], autoimmune disease [3], and general all-cause mortality [4]. Stress has also been associated with worse health outcomes among those living with chronic illness [5], suggesting that stress may exacerbate pre-existing health conditions, as well as contribute to the development of new health problems. Stress not only contributes to physical and mental health problems, such as heart disease, depression, and autoimmune diseases [6], but also has negative impacts on family life and work, significantly impairing quality of life [7,8]. Accordingly, stress is an important concern for public health prevention initiatives [7,8].
Health surveys have demonstrated that stress negatively impacts a large proportion of the US population [9]. Underscoring the magnitude of the problem, a study conducted by Harvard School of Public Health found that 49% of the American public reported being stressed within the last year, and also found that that 60% of those who reported being in poor health also reported experiencing a substantial amount of stress within the last month [7]. Further, levels of stress appear to be unequally distributed throughout the population [10]. National surveys have documented that higher levels of stress are reported among those who were of lower income, less educated, and younger [11]. Theorists have suggested that geographic clustering of psychological characteristics may be driven by selective migration (i.e., in this case, people more vulnerable to stress seek out others like themselves), social influence (i.e., attitudes and beliefs that lead to greater stress cluster together geographically), or environmental influence (i.e., features of the physical environment, such as neighborhoods, increase stress among those who live close to one another) [12]. In short, large-scale studies have documented both the high prevalence of stress within the US, as well as geographic clustering of psychological distress, suggesting that tracking symptoms of stress should ideally occur at both the national and local levels.
Relaxation is considered a key component of frontline stress management techniques, such as cognitive-behavioral stress management [13]. General stress management can include adaptive coping (e.g., distraction), physical relaxation strategies (e.g., diaphragmatic breathing), cognitive reappraisal (e.g., reconsidering the stressor from a different perspective), and mindfulness (i.e., increasing awareness of the present moment). These stress management strategies are intended to reduce psychological and physiological arousal related to stress, promote healthier coping alternatives, and, in turn, reduce some of the negative health impacts of stress. Indeed, these strategies have been found to be effective for improving health outcomes among those living with chronic illness [14][15][16], as well as for improving general mental health and quality of life [17,18].
Understanding what the major causes of stress are and how people negatively or positively manage their stress (e.g., through stress management techniques such as cultivating relaxation) is important [7,19]. Population health surveys often use telephone interviews or questionnaires from samples of the population, e.g., CDC's Behavioral Risk Factor Surveillance System (BRFSS) [20]. These methods, although reliable, are conducted relatively infrequently due to cost, and may be less effective at reaching certain populations, such as those without a dedicated landline telephone. With the rapid growth of online social networks today, social media data can serve as a useful additional resource to understand aspects of stress that are difficult to assess in general surveys or clinical care. For example, social media provide a means to rapidly and dynamically address new and evolving research questions with a degree of flexibility not possible with surveys. Social media may also provide insights into populations that may be underrepresented in surveys (depending on the demographics of the particular social media platform used). Thus, social media can potentially serve as a beneficial supplement to detailed surveys when understanding public health concerns.
Twitter -one of the most popular social media platforms -is a micro-blog service that allows users to post their own personal messages (a 'tweet' with a 140-character limit). As of May 2016, it had 310 million active users with 1 billion unique visits monthly to sites with embedded tweets [21]. The utility of Twitter as a data source has been investigated in numerous applications such as election prediction [22], stock market prediction [23], oil price changes [22], and earthquake and disasters [24].
Twitter has also been used in public health for influenza tracking [25][26][27], studying breast cancer prevention [28], childhood obesity [29], issues related to general health [30], tobacco and e-cigarette use [31], dental pain [32,33], general pain [34], sexually transmitted diseases [35], and weight loss [36]. There has also been research regarding the general well-being of people in different geographical locations using Twitter messages [37], a correlation study of Twitter messages with depression [38], as well as with heart disease mortality [39]. However, no studies specifically focused on stress and stress management have been conducted until now.
In this paper, we investigate how and in what ways people express their own stress and relaxation through an in-depth content analysis of Twitter messages. In addition, we investigate automated methods to classify stress and relaxation tweets using machine learning techniques. Furthermore, we rank stress and relaxation levels based on the relative proportions of stress and relaxation related tweets (as identified by our NLP classifiers) in four U.S. cities: New York, Los Angeles, San Diego, and San Francisco. We then compare these results to public surveys reported by Forbes and CNN [40,41]. This study will provide another perspective on how people think about and cope with stress using easily-acquired, naturalistic Twitter data, complementing existing survey-based epidemiological methods.

Dataset 1
To begin our investigation of stress and relaxation (stress management) tweets, we first collected tweets with user-defined stress and relaxation topics using the Twitter REST Application Programming Interface (API) [42]. The user-defined topics included the hashtagged topics #stress and #relax, as well as variations of these words. The full search list used can be found in Table 1. Tweets were collected between July 9 and July 14, 2014. We supplemented this seed dataset with tweets from the random sample stream Twitter Streaming API [43] (1% sample rate) in order to have better representation of "everyday" tweets that did not necessarily contain stress and relaxation related hashtags, but that still contained the keywords "stress" or "relax." This dataset consists of 1326 stress-related and 781 relaxation-related tweets. We refer to this dataset as Dataset 1.

Dataset 2
We further investigated the characteristics of stress and stress management by geographical location (four cities) and compared the locations against each other using Dataset 2. This dataset -much larger than Dataset 1 -consisted of geo-tagged tweets obtained from the Twitter Streaming API [43] in one of four possible cities: Los Angeles,

Gold Standard and Manual Analysis of Tweets
Since our primary goal in this study is to understand how people express stress and relaxation through Twitter, we developed annotation guidelines for both stress and relaxation tweets based on reports from the American Psychological Association (APA) [7], Centers for Disease Control and Prevention (CDC) [8,44], and medical websites [6,45,46]. Following these guidelines, tweets were classified by both genre and theme. Genre reflects the format of the tweet (for example, personal experience), and theme reflects the domain of the actual content conveyed (including such categories as stress symptoms and stress topics).
Details for each genre and theme for stress and relaxation tweets are given below: -Genre: We categorized tweets as being first-hand experience vs. other genre. First-hand experience was defined as a direct personal experience, or an experience directly related to the user writing the tweet. Other genres included second-hand experience, advertisements, news articles, etc. This genre classification was based on previous work on classifying health-related tweets [31]. After classifying a tweet as first-hand experience, we assigned its content into two themes: stress or relaxation. -Stress themes: Content analysis focused on three main questions: (1) What kind of stress was being experienced? (2) What was the cause of the stress? and (3) What kind of actions, if any, were being taken regarding the stress? Based on these questions, we categorized the theme into three categories: stress symptoms, topics, and action(s) taken.
• Symptoms: There were three classes of symptoms: (1) psychological and emotional, social relationships, (5) travel, (6) temporal, and (7) other. These topics were identified based on an analysis of data from Dataset 1.
• Action taken: This theme indicated the action that people reported taking when they were stressed. The action could be either negative or positive. An example of a negative action is: I need a drink tonight. #sostressed. An example of a positive action is: I need a nap, and a hug. #stressingout #tired. • Non-specific: This theme was used for users that simply tweeted without any symptom, topic, or action. Examples include #stressed!!!, Bad Night :,( #SoStressed, etc. -Relaxation themes: We categorized first-hand experience relaxation tweets by the topics (themes) given below.
• Topics: The action reported being taken by the user in order to relax, such as exercising or listening to music. A total of 11 topics were created based on data from Dataset 1:   The schemas for stress and relaxation tweets are depicted in Figure 1 and Figure 2.
Definitions and examples of each category of first-hand experience tweets and its themes for stress and relaxation tweets are listed in Appendix 1 and Appendix 2.
One author (AR) annotated stress and relaxation tweets from Dataset 1 and another (SD) annotated and verified the dataset to ensure that all tweets were annotated correctly. Any disagreements were resolved by meetings or exchanging emails. Dataset 1 contained a total of 664 stress and 662 non-stress tweets among the 1326 stress-related tweets, and a total of 391 relaxation and 390 non-relaxation tweets among the 781 relaxation-related tweets. For each stress or relaxation tweet, two authors (AR, SD) discussed and manually annotated tweets based on the guidelines as described above. After annotation, there were a total of 479 stress tweets and 335 relaxation tweets related to first-hand experience in Dataset 1. The details of Dataset 1 are depicted in Figure 3. Since the prevalence of some of the stress themes (e.g., finances, work) and relaxation themes (e.g., food & drink, social) in Dataset 1 was very low (i.e. too infrequent to train a machine learning classifier), we developed an automatic keyword-based theme classifier using a manually crafted lexicon of stress and relaxation keywords associated with each category. We first generated unigrams and bigrams from Dataset 1, and one author (AR) manually reviewed and selected the highest frequency unigram and bigram keywords. We then manually added corresponding synonyms into each theme in order to increase the coverage of the classifier. For example, the topic "education" in the stress schema contained unigrams "school", "college", "classes", and the bigram "high school" in Dataset 1. We manually added synonyms of those terms such as "exams" and "studying" as unigram keywords and "college life", "my tuition" and "on finals" into bigram keywords. The list was iteratively reviewed and confirmed by another author (SD). There was an average of 20 unigram and 20 bigram terms for each theme. Only unigram and bigram keywords were created since tweet messages are short in nature. Bigram keywords were necessary to include idiomatic expressions like "vicious cycle" and "hate feeling", and they also added more specificity such as "my heart" and "my sanity", which helped to increase the accuracy of the classifiers.

Machine Learning Algorithms
Leveraging the annotated data derived from our content analysis of Dataset 1, we applied and evaluated machine learning algorithms for classification of stress vs. non-stress tweets and relaxation vs. non-relaxation tweets (on Dataset 1). In order to apply the classifier trained on Dataset 1 to the unseen, much larger Dataset 2 (cities dataset), we first filtered tweets by only keeping tweets that contained stress/relaxation-related hashtags in Table 1 or keywords "stress"/"relax" for each city in Dataset 2. After this step, Dataset 2 contained only tweets with stress/relaxation-related keywords or hashtags. To calculate the proportion of stress/relaxation tweets at the city level, we utilized the stress/relaxation classifier trained on Dataset 1 to filter stress/relaxation tweets and then applied the classifier for first-hand experiencer to tweets from each city in Dataset 2. Figure 4 shows a flow chart describing our machine learning design.
The work described in this paper focused on two machine learning-based classification tasks. First, tweets were classified into the appropriate stress and relaxation category (i.e., is it stress or relaxation related?). Second, first-hand experience tweets vs. non first-hand experience tweets were classified. We used two machine learning algorithms: naïve Bayes and Support Vector Machines (SVMs), which were implemented on Dataset 1 using 10-fold cross-validation. We used both the Naive Bayes and SVM algorithms, as both these algorithms have been used extensively for text classification tasks [50][51][52]. We used the rainbow package [51] for implementing both naïve Bayes and SVMs (linear kernel). We used "bag-of-words" as feature sets for both algorithms. The reason we used the "bag-ofword" representation is that this feature representation is considered as a baseline and the most common text representation in text classification in general [50][51][52]. To the best of our knowledge, this is the first study on classifying tweets on stress and relaxation tweets.

Calculating Proportion of Stress and Relaxation Tweets at the City Level
We applied the two-step classification to each city in Dataset 2 to automatically identify stress and relaxation tweets. We calculated the proportion of stress/relaxation tweets to the total number of tweets in each city.

Measurements and Statistical Analysis
For both stress/relaxation and first-hand experience classifications, we used accuracy, sensitivity, specificity, and Positive Predicted Values (PPV) as metrics [53][54][55] . Where TP is the number of tweets that are correctly classified as true, FP is the number of tweets that are incorrectly classified as true, FN is the number of tweets that are true but incorrectly classified as true, and TN is the number of tweets that are correctly classified as false.
In order to compare data among cities, we used Pearson's chi-squared test and reported significance if the P-value was less than 0.05 [56]. Statistical analyses were performed using the R package software, publicly available at https://www.r-project.org. Note that in order to preserve the anonymity of Twitter users, all example tweets reported in this paper are paraphrases of original tweets. Figure 5a shows the distribution of themes in first-hand experience stress tweets. This figure indicates that the highest frequency theme in stress tweets is topic, followed by symptoms (e.g., Not sure what to do... #stressed #worried #lost), non-specific (e.g., #stressed!!!), and action taken (e.g., I need a drink #sostressed). This suggests that Twitter users who post about stress usually post more about the cause or topic of their stress and less about actions and symptoms associated with stress.

Content Analysis in Stress and Relaxation Tweets (Dataset 1)
Among the total number of stress-related tweets, we found that the most frequent topic was education (15%), followed by work (9%) and social relationships (8%). This is interesting because many of Twitter's users are young people who attend school [57,58]. It seems that education and issues related to education, e.g., exams and finals, are of the utmost concern for Twitter users. Examples of the education topic include: Never doing a session B math course ever again #sostressful or my exam in less than a month?! #stressing.
The topic distributions of first-hand experience stress tweets are depicted in Figure 5b.
Relaxation-related tweets encompass a wider range of topics than stress-related tweets.
The most frequent topic of relaxation tweets was rest and vacation (36%), followed by nature (22%) and water (20%). Topic distributions of first-hand experience of relaxation tweets are depicted in Figure 6.

Automatic Classification of Stress and Relaxation Tweets (Dataset 1)
Cross-validated classification results are shown in     Table 3 showed the terms that have highest information gain for stress/relaxation classification. Interestingly, we found that most terms characteristic of the stress class are related to the term "stress" such as "stressed" or "stressin". In contrast, most terms characteristic of the relaxation class are "vacation", "water", or "beach," which are related to the topics as categorized in our relaxation schema.

Automatic Classification of Stress and Relaxation Tweets at the City Level (Dataset 2)
Using a SVM algorithm trained on our annotated data (Dataset 1), we automatically classified the much larger Dataset 2 (cities dataset). We used a three step classification process. First, we filtered by keywords "stress"/"relax". Second, we applied the stress/relaxation classifier to this filtered data. Third, we used the first-hand classifier to identify first-hand stress/relaxation tweets. In both steps, we used SVM (linear kernel) trained on Dataset 1 as the classifier. The reason we used SVM because it had advantages in stress/relaxation classification in comparison to naïve Bayes in the Dataset 1. The number of tweets after each step is shown in Table 4.
To evaluate performance of stress/relaxation classification in Dataset 2, we randomly sampled two sets of 100 tweets, with each set consisting of 100 tweets containing either keyword "stress" (Set 1) or "relax" (Set 2) from a city in Dataset 2. We chose New York for evaluation since New York had the greatest number of tweets. Then 100 tweets from Set 1 were manually annotated (conducted by author SD) as stress/non-stress and first-hand stress/non first-hand experience stress class. Similarly, 100 tweets from Set 2 were also manually annotated as relaxation/non-relaxation and first-hand relaxation/non first-hand experience relaxation class. Table 5 showed results of classification on Set 1 and Set 2 using the SVM algorithm. It indicated fair accuracy (66%-92%) and high PPV (84.62-100%), however it has lower sensitivity in first-hand stress classification (44%) and specificity in relaxation classification (57.14%). The results of the SVM algorithm in Dataset 2 are different when compared to Dataset 1, perhaps due to different data distribution. The descriptions of manual annotation on 100 random tweets of Set 1 and Set 2 are shown in Figure 7. Figure 8 shows the proportion of stress/relaxation tweets out of all tweets by city in Dataset 2. The number of stress tweets is two times more than the number of relaxation tweets, indicating that Twitter users are more likely to tweet about stress than relaxation.
To evaluate theme classification by keyword matching, we randomly sampled 50 classified tweets for each theme from New York. Manually review showed that keyword classification achieved a PPV from 60% to 90% for relaxation tweets and 40% to 80% for stress tweets. First-hand classification results from Dataset 2 showed that cities manifest a uniform pattern of stress and relaxation tweets. We found that the singular first person pronoun "I" was consistently used the most across all cities when expressing stress, found in ~4% of all stress tweets, while in relaxation tweets "I" was used less often (ranked 7), at around 2.4%. Details of the 30 highest frequency keywords in first-hand experience stress and relaxation tweets for Los Angeles, New York, San Diego, and San Francisco are shown in Appendix 4.
We also found that linguistic expressions of negation such as "not," "but," "don't, " or quantifying words such as "much" are among the thirty unigrams most characteristic of stress-related tweets. In addition, users often use emotionally-laden swear words when expressing stress. It is important to note however that the affective polarity of certain swear words can be highly context dependent ("it's shit" vs. "it's the shit") [59]. Relaxation tweets, on the other hand, tend to contain words indicating relaxation and time such as "relax," "home," "time," "day," "now." We found that "home" is among the highest frequency terms in relaxation tweets, as is "weekend". Tag clouds of stress and relaxation tweets for each city are depicted in Appendix 5. Figure 7. Description of manual annotation on 100 random tweets containing keywords "stress" and "relax" from Dataset 2.  Table 4. Stress ranking is based on 2011 Forbes [40] and 2014 CNN studies [41]. Statistical tests between cities showed there are differences between cities (P<0.0001), except San Diego and New York (Stress: P=0.18, Relaxation: P=0.02). P-values of relaxation and stress tweets between San Diego and Los Angeles are 0.41 and 0.000154, respectively. Ranks based on stress tweets are: New York=San Diego, Los Angeles, and San Francisco. Table 5. Classification evaluation using a random sample of 200 tweets (100 containing the keyword "stress"; 100 containing the keyword "relax") from New York in Dataset 2. We reported accuracy (Acc), Sensitivity (Sen), Specificity (Spec), and Positive predictive values (PPV) measures.    Figure 9a shows the theme distributions of stress tweets among cities. Education is the highest frequency topic (12-14%), followed by work (4-5%) and travel (4%). Interestingly, we found that tweets describing action taken and psychological & emotional symptoms also have relatively high frequencies (8-10%). This indicates that beside topic, people often post about their emotional state and reaction to stress.
Though we do not find statistically significant differences in theme distributions among cities for stress tweets, there were significant differences between New York and other cities in the topics of nature and water in relaxation tweets. This may indicate the different activities taken for relaxation between the East Coast (New York) and West Coast (Los Angeles, San Diego, San Francisco). We found that high frequency terms for relaxation tweets in New York included "watching," while in San Diego "beach" was more common. This intuitively suggests that San Diegans more often relax by going to the beach, while New Yorkers relax by enjoying indoor (or spectator) entertainment ("watching", "listening").

Correlations between Tweets Data Analysis and Public Surveys
Compared to two public surveys on the most stressful cities in the U.S. by Forbes [40] in 2011 and CNN [41] in 2014, the proportion of stress tweets found here are different. Both surveys ranked New York and Los Angeles among the most stressful cities in the country, while San Diego and San Francisco were categorized as less stressful. Our city rankings based on the proportion of first-hand experience stress tweets is New York followed by San Diego, Los Angeles, and San Francisco ( Table 6 and Figure 8). While we found no significant difference between New York and San Diego, we did find significant differences (P-value<0.0001) in pairwise comparisons between San Diego, Los Angeles, and San Francisco ( Table 6).
Differences between results found in public stress surveys and our automatic classification of Twitter messages could be due to differences in methodology and population when collecting data. Public surveys collect data using telephones and paper-based reports, while Twitter messages are user-generated, naturalistic, and reflect personal thoughts. We suggest that Twitter could be used as a real-time, low-cost, and flexible supplement to public surveys when understanding and investigating stress and stress management techniques.

Stress Relief by Relaxation in Tweets
The distribution of stress topics across cities shows an interesting finding: peoples' reactions to stress are more positive than negative. Figure 9a shows that for all cities, 8-10% of tweets report positive action taken in response to stress, while only 1-2% report negative action. This suggests that people may react to stress positively, or that people are more likely to publicly report positive rather than negative actions. Examples of positive reaction in stress tweets include rest (Rest is best when you are stressed) or exercising (I'm so stressed, thank god I'm heading to yoga now). Table 6. P-values of pairwise comparisons of proportion of stress/relaxation tweets between the four studied cities.
Relaxation can be considered a stress management activity. Figure 8 shows that the numbers of relaxation tweets are consistently proportional across all cities to those of stress tweets, indicating that Twitter users are consistently more inclined to post about stressful life events or experiences than relaxing experiences. Examples of stress relief from relaxation tweets include personal contact (I don't need anything but a hug...), exercising (Went for a run, feel awesome, now time to relax), shopping (Last day in #SanDiego Just relaxing, shopping and say bye to friends), or entertainment (Relaxing watching a movie:-) :-)). Figure 6 and 9a also indicate that rest & vacation is the highest frequency topic within relaxation tweets, followed by entertainment & hobbies, nature, and water. These topics can be considered common activities for stress relief.

Principal Results
Our research addresses several aspects of the use of Twitter as a medium of expression of stress and relaxation by users. First, we created a schema for categorizing stress and relaxation-related tweets based on previously published psychological guidelines. By categorizing first-hand experience tweets into the primary themes of content topics, symptoms and actions taken, we gained further insight into the common patterns of expressions of stress.
Second, we analyzed in detail the contents of tweets based on our annotation scheme and found both similarities and differences in the prevalence and characteristics of stress and relaxation tweets across cities on the East and West Coasts of the United States. The most frequent topic of stress tweets in our datasets was education, which likely reflects the  [57,58], but work and travel were also common topics. It is notable that despite poverty rates, unemployment rates, and cost of living being significant factors in the methodology of CNN's and Forbes' stress ranking systems of most stressful cities, finances were not a major content topic of the stress tweets in any city. Although this result could be partially attributable to the need for either computer or mobile phone access in order to use Twitter and may cause under-representation in lower income groups, it may also indicate that certain topics, such as personal finances, still remain relatively taboo in social media settings. Regarding positive and negative actions regarding stress, positive actions far outnumbered more destructive behavior. The use of Twitter in itself to discuss feelings of stress and stress management can be seen as a constructive manner of dealing with stress by expressing these feelings and using the support of "followers" and friends. Social media platforms are increasingly being used as support networks in the management of chronic health conditions as varied as cancer, depression, and obesity. A recent systematic review by Patel et al. found that the impact of social media use on those experiencing chronic disease was positive in 48% of studies reviewed, neutral in 45%, and harmful in only 7% [60].
Third, our study indicates that words most associated with relaxation strategies (see Table  3) fall into three main groups: (1) bathing and personal care (e.g. "bath", "shower"), (2) vacationing ("vacation", "pool", "beach"), and (3) watching sports or TV ("videos", "sitting", "watching"), indicating that relaxation strategies involve purposefully taking time away from work-based activities and daily responsibilities. A further key theme that emerges from a qualitative analysis of the data is the idea of nature -in this case, particularly water (e.g. "pool", "beach", "rain") -as of key importance for relaxation. This result is consistent with recent research demonstrating the link between stress reduction and exposure to the natural environment (e.g. [61]).
Finally, we showed that machine learning algorithms could be employed to achieve good accuracy for the automatic classification of stress and relaxation tweets.

Limitations
This study has several limitations. First, Dataset 2 was obtained from the Twitter API's 1% sample. Second, the annotation scheme we developed, although well suited for our purpose, could benefit from further refinement. For example, we found that many tweets were categorized as topic "other". Third, it is likely that classification results could be improved given the availability of additional training data, in particular for first-hand experience classification of stress and relaxation tweets. Furthermore, utilizing additional feature sets -e.g. ngrams, emotions, negations -could help improve accuracy. Fourth, Twitter reports of stress and relaxation may be influenced by self-presentation issues (e.g. stress related to excessive workload can be used as a status indicator in some contexts). Finally, as with all social media-based research, the population studied is unlikely to be a representative sample of the general population.

Conclusions
In summary, this research shows that Twitter can be a useful tool for the analysis of stress and relaxation levels in the community, and has the potential to provide a valuable supplement to social and psychological studies of stress and stress management.