This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included.
Widespread fear surrounding COVID-19, coupled with physical and social distancing orders, has caused severe adverse mental health outcomes. Little is known, however, about how the COVID-19 crisis has impacted LGBTQ+ youth, who disproportionately experienced a high rate of adverse mental health outcomes before the COVID-19 pandemic.
We aimed to address this knowledge gap by harnessing natural language processing methodologies to investigate the evolution of conversation topics in the most popular subreddit for LGBTQ+ youth.
We generated a data set of all r/LGBTeens subreddit posts (n=39,389) between January 1, 2020 and February 1, 2021 and analyzed meaningful trends in anxiety, anger, and sadness in the posts. Because the distribution of anxiety before widespread social distancing orders was meaningfully different from the distribution after (
We did not find any differences in LGBTQ+ youth anger and sadness before and after government-mandated social distancing; however, anxiety increased significantly (
During the COVID-19 pandemic, LGBTQ+ teens increased their reliance on anonymous discussion forums when discussing anxiety-provoking topics. LGBTQ+ teens likely perceived anonymous forums as safe spaces for discussing lifestyle stressors during COVID-19 disruptions (eg, school closures). The list of prevalent anxiety-provoking topics in LGBTQ+ teens’ anonymous discussions can inform future mental health interventions in LGBTQ+ youth.
The COVID-19 pandemic has dramatically affected both physical and mental health worldwide. As of February 1, 2021, the novel coronavirus infected over 100 million people in the United States and has killed over 2.5 million people globally [
This sharp mental health decline may be different for LGBTQ+ youth, who disproportionately experienced a high rate of adverse mental health outcomes before the COVID-19 pandemic due to prejudice, victimization, and unaccepting communities [
LGBTQ+ youth report that cost and parental consent are barriers to accessing mental health resources, and the inability to access confidential school counseling during COVID-19 school closures magnifies these obstacles [
Questioning one’s sexuality is a normal developmental aspect of adolescence [
As individuals begin to realize their sexual orientation, they may choose to self-disclose their identity. Scholars conceptualize self-disclosure of sexual and gender identity as a dimension of the coming-out process that is closely linked to self-esteem, emotional distress, and well-being [
Research shows that LGBTQ+ youth resort to computer-mediated communication to explore their identities and find community [
Additionally, because gender and sexual minorities are highly stigmatized, LGBTQ+ youth may not be comfortable disclosing their gender or sexual orientation to researchers as part of formal surveys and experiments [
In summary, the COVID-19 crisis has caused a concurrent mental health crisis. LGBTQ+ youth are especially vulnerable to adverse mental health outcomes, and online anonymous support forums are a uniquely accessible resource for LGBTQ+ youth to disclose their identities during the pandemic. LGBTQ+ self-disclosure is helpful for LGBTQ+ youth’s mental health [
However, at the time of this study, no longitudinal studies have investigated how discussions of the themes and sentiment of LGBTQ+ youth support forums unfold. We aimed to address this knowledge gap. We raised the following question: What patterns of emotions emerge from longitudinal analyses of LGBTQ+ youth conversation during the COVID-19 crisis?
Given that LGBTQ+ youth were disproportionately vulnerable to adverse mental health outcomes before the pandemic [
In addition to being suitable for naturalistic investigations of emotion over time, online communities can illuminate which topics contribute to meaningful emotional trends. Knowing which topics are emotionally distressing to LGBTQ+ individuals is a requisite precursor to informing LGBTQ+ youth mental health interventions, yet at the time of this study, none had investigated themes related to LGBTQ+ youth online forums during the COVID-19 pandemic. To address this gap in the literature, we raised the following question: What conversation topics manifest from meaningful emotional trends?
The pushshift (version 4.1) Python (version 3.9.0) package was used to extract all public posts made between January 1, 2020 and January 31, 2021 from the r/LGBTeens subreddit (n=38,389 posts). We chose this online community because of its popularity as a community for LGBTQ+ youth and its specific focus on teens. Because we aimed to assess how users’ textual expressions manifested amid global events, not in response to others’ posts, comments were excluded from the data set. Although the anonymity of Reddit prevented us from accessing demographic information about the r/LGBTeens community, Reddit users live predominantly in the United States (49.3%) [
To understand whether r/LGBTeens emotional patterns were specific to LGBTQ+ teenagers, we compared the trajectory of emotional tone in r/LGBTeens posts with those in 2 other subreddit microcommunities. After a review of relevant subreddits, we determined that r/Teenagers was the largest subreddit community, with n=1,364,980 posts, tailored toward a wide population of teens. To investigate LGBTeens post sentiment relative to widespread interpersonal relationship turmoil during the COVID-19 crisis [
This study only used information that could be accessed freely by the public. This study did not include any personally identifiable information. The institutional review board recognized that analysis of publicly available data does not constitute research on human participants. Thus, ethical review approval was not required for this study.
To track negative emotions over time, we analyzed aggregate post sentiment using the Linguistic Inquiry and Word Count program (LIWC) [
We focused on levels of anger, sadness, and anxiety present in posts because these psychological processes are symptoms of COVID-19–induced mental health challenges. For example, COVID-19 health threats and uncertainty may trigger feelings of anxiety [
Likewise, the loss of loved ones, feelings of isolation, and routine disruptions associated with the rapidly changing COVID-19 pandemic may trigger feelings of sadness. For example, a recent study [
We explored differences in the trajectory of emotions displayed in posts by visualizing trends. In addition, we recorded salient events during the crisis to examine how they may have affected the changing patterns of COVID-related user responses and associated emotions.
We marked 10 major events in the course of the COVID-19 crisis. Events were selected if they considerably disrupted LBGTQ+ youth lifestyles (eg, widespread school closures) [
We extrapolated meaningful conversation topics related to trends in anxiety, anger, sadness, as well as, overall emotional tone. We conducted 2-tailed independent sample
The results of the 2-tailed independent t tests suggested differences in the mean of anger (
LDA topic modeling is a bag-of-words machine learning algorithm that extrapolates meaningful topics from a large body of texts, in this case, subreddit posts [
To generate a bag of words for the LDA model, we preprocessed the texts by tokenizing the text, removing stop words, lemmatizing the text, and generating a document term matrix. Tokenization separates sentences into bags of unordered words by removing all punctuation and making words lowercase. Given our relatively small sample size (n=7882 texts), lemmatization was necessary to reduce model noise. Lemmatizing texts removes prefixes and suffixes by transforming all words to their base lemma (eg, we converted “vaccinated” and “vaccinating” to “vaccine”). Only nouns were retained through the process of lemmatization because other parts of speech were not meaningful to our topics.
We set β to learn the asymmetric prior from the data [
Topics defined by the model require human labeling. LDA generates a list of the most relevant 30 terms, along with each term’s β value (ie, their relative contribution to that topic) (
A third human coder validated the labels by reviewing the top 10 most relevant posts for each topic and confirming the human coder–assigned labels are present in those posts. The third human coder confirmed that the incoherent topic was indeed incoherent (
We quantified the proportion of angry sentiment observed in r/LGBTeens posts relative to each post’s total number of words. Of the 39,389 posts in the data set, 17.67% (6961) were classified as containing anger. The mean percentage of words denoting anger relative to the total number of words in an anger-flagged post was 4.72% (SD 8.81%). There was an upsurge of anger following widespread school closures in the United States, and a second upsurge when many schools announced closures would last through the end of 2020 (
Histogram of the percentage of angry sentiment observed in r/LGBTeens posts over time, juxtaposed with a solid blue polynomial regression line showing average anger levels observed in angry r/LGBTeens posts over time. LGBTQ+: Lesbian, Gay, Bisexual, Transgender, Queer/Questioning, and Others.
We also quantified the proportion of sad sentiment observed in r/LGBTeens posts relative to the total number of words in each post (
Histogram of the percentage of sad sentiment observed in r/LGBTeens posts over time, juxtaposed with a solid red polynomial regression line showing average sadness levels observed in sad r/LGBTeens posts over time. BLM: Black Lives Matter; LGBTQ+: Lesbian, Gay, Bisexual, Transgender, Queer/Questioning, and Others.
Of the 39,389 posts in the data set, 20.01% (7881) were classified as containing anxiety (anxiety level: mean 4.97, SD 10.43). Although all 3 negative emotions analyzed in r/LGBTeens posts increased over time, anxiety trended upward the most sharply. The histograms reveal a sharp spike in anxiety, sadness, and anger in the first week of May 2020, which may reflect emotional distress resulting from US schools closing for the remainder of the school year (
Histogram of the percentage of anxiety sentiment observed in r/LGBTeens posts over time, juxtaposed with a solid grey polynomial regression line showing average anxiety levels observed in anxious r/LGBTeens posts. BLM: Black Lives Matter; LGBTQ+: Lesbian, Gay, Bisexual, Transgender, Queer/Questioning, and Others.
To supplement the conclusion that the mental health of r/LGBTeens community members had been adversely impacted by the COVID-19 crisis and concurrent social and political unrest, we measured the overall emotional tone of each post from the expanded timeline of January 1, 2015 through January 31, 2021. Emotional tone is measured by LIWC as the valence of texts (ie, whether a text is positively valenced or negatively valenced) [
Of all posts encompassed in the 5-year period (n=123,440), the average post valence was negative (mean 44.33, SD 35.11, SEM 0.10, minimum 1.00, maximum 99.00, skewness 0.53, kurtosis −1.25). We found that posts became more negatively valenced throughout 2020 (
Green polynomial regression line representing mean emotional tone of r/LGBTeens posts from January 1, 2015 through January 31, 2021. We note a sharp decrease in the emotional tone of posts during 2020.
We compared the emotional tone of r/LGBTeens posts from January 1, 2020 to January 31, 2021 (n=38,389 posts) to the emotional tone of r/Teenagers (n=1,364,980) and r/Relationships (n=193,282) posts from the same time period (
Polynomial regression line representing the average post emotional tone (values of 100 represent maximally positive emotional tone; values below 50 represent more negatively valenced tone) of 3 subreddit communities (r/LGBTeens, r/Teenagers, and r/Relationships). BLM: Black Lives Matter; LGBTQ+: Lesbian, Gay, Bisexual, Transgender, Queer/Questioning, and Others.
A 2-tailed independent samples
Additionally, a 2-tailed independent samples
Findings revealed that the emotional sentiment of r/LGBTeens posts (mean 42.34, SD 34.68) was significantly greater than those of r/Teenagers posts (mean 37.4, SD 32.95) and r/Relationships posts (mean 28.38, SD 15.57).
We found a decrease in emotional tone in late January in both r/Teenagers and r/LGBTeens posts when many schools closed temporarily (
Furthermore, we ran a point biserial correlation analysis to assess whether negative emotion changed as US social distancing orders were relaxed and vaccine distribution increased in January 1, 2021 [
Because the distribution of anxiety before widespread social distancing orders was meaningfully different from the distribution after lockdown (
Streamgraph of topic percent contribution to the corpus of anxious r/LGBTeens posts over time. Each color represents a discussion topic across the corpus of anxious r/LGBTeens posts. This streamgraph shows the composite of topics overall (graph envelope) and the relative importance of the topic to posts over time (topic color stream width).
We employed natural language processing to investigate the emotional trends in LGBTQ+ teens’ anonymous online conversations during the COVID-19 pandemic. Results revealed that the overall emotional tone of posts sharply decreased during the 2020-2021 COVID-19 crisis, relative to prior years—revealing this emotional trend was specific to the COVID-19 crisis. Findings reveal that the emotional trajectory of LGBTQ+ youth fluctuated more drastically in response to impactful events during the COVID-19 crisis (eg, widespread school closures and Black Lives Matter protests) compared to the emotional trajectory of more neutral subreddit spaces [
Findings revealed that the trajectory of LGBTQ+ teens’ overall emotional tone (positive vs negative) to be more affected by lifestyle stressors during the COVID-19 crisis than the general population of r/Teenage users. Results are consistent with those from previous research indicating that LGBTQ+ youth are disproportionately vulnerable to adverse mental health outcomes relative to their straight, cisgender peers [
While this study did not find pre and postlockdown differences in LGBTQ+ youth anger and sadness, results revealed that anxiety increased after government-mandated social distancing measures. In addition, further analysis revealed a list of 10 anxiety-provoking topics discussed during the pandemic: attraction to a friend, coming out, coming out to family, discrimination, education, exploring sexuality, gender pronouns, love/relationship advice, starting a new relationship, and struggling with mental health. These conversation topics were anxiety-provoking for LGBTQ+ youth both before and during the pandemic. However, the increase in the frequency of these conversations coincided with the emergence of lifestyle disruptors related to the pandemic, reflecting LGBTQ+ teens’ increased reliance on anonymous discussion forums as outlets for discussing lifestyle stressors during COVID-19 lifestyle disruptions (eg, school closures).
Findings revealing LGBTQ+ teens’ increased reliance on an anonymous forum as a discussion outlet during the COVID-19 outlet were consistent with those from previous studies showing that individuals are likely to turn to social media in times of crisis to seek psychological support and build community resilience [
This study also shed light on the specific sources of anxiety for LGBTQ+ youth during the COVID-19 pandemic. Research has revealed links between LGBTQ+ youth anxiety disorders and self-harm and suicidal behavior—in part due to stigma and discrimination [
Additionally, this study’s findings suggest that mental health professionals should consider anonymous online supplements or alternatives to in-person treatment of LGBTQ+ youth anxiety, especially during school closures. Despite mental health professionals’ adaptation to web-based counseling, LGBTQ+ youth report that treatment cost and parental consent are barriers to accessing mental health resources outside of school [
Although this study provides valuable insight into LGBTQ+ youth mental health during the COVID-19 pandemic, the study had some limitations. First, using computerized coding tools such as LIWC does not allow for sophisticated coding that could be achieved with human coders. Previous studies using LIWC have found that LIWC may overidentify emotional expression [
The COVID-19 crisis has caused a concurrent mental health pandemic, and LGBTQ+ youth are especially vulnerable to adverse mental health outcomes [
Polynomial regression of Linguistic Inquiry Word Count anxiety levels in r/LGBTeens posts from January 1, 2015 through January 31, 2021.
Perplexity metric.
Latent Dirichlet allocation results.
Intertopic distance.
Topic model data processing.
Point biserial correlation analysis.
Anxiety-related topics over time.
Histogram of the percentage of positive sentiment in r/LGBTeens posts over time and average level of positive emotion in positive r/LGBTeens posts over time.
latent Dirichlet allocation
Lesbian, Gay, Bisexual, Transgender, Queer/Questioning, and Others
Linguistic Inquiry and Word Count program
None declared.