Health Information on Pre-Exposure Prophylaxis From Search Engines and Twitter: Readability Analysis

Background: Pre-exposure prophylaxis (PrEP) is proven to prevent HIV infection. However, PrEP uptake to date has been limited and inequitable. Analyzing the readability of existing PrEP-related information is important to understand the potential impact of available PrEP information on PrEP uptake and identify opportunities to improve PrEP-related education and communication. Objective: We examined the readability of web-based PrEP information identified using search engines and on Twitter. We investigated the readability of web-based PrEP documents, stratified by how the PrEP document was obtained on the web, information source, document format and communication method, PrEP modality, and intended audience. Methods: Web-based PrEP information in English was systematically identified using search engines and the Twitter API. We manually verified and categorized results and described the method used to obtain information, information source, document format and communication method, PrEP modality, and intended audience. Documents were converted to plain text for the analysis and readability of the collected documents was


Introduction
Pre-exposure prophylaxis (PrEP) is proven to prevent HIV infections [1][2][3], and using PrEP to prevent new HIV infections among vulnerable populations has been identified as a critical strategy in the Ending the HIV Epidemic initiative [4].PrEP use increased more than 10-fold in the United States in just 5 years after its approval in 2012 [5].However, inequitable PrEP uptake persists in the United States [6][7][8], especially in the southern United States and among Black and Hispanic residents in the United States.When those who need PrEP the most do not receive it, we lose opportunities to maximize reductions in new HIV infections.One potential explanation for lower uptake among vulnerable populations for HIV infections is the lower level of health literacy in these groups (ie, people in the southern United States, people living in poverty, younger people, and people of Black race or Hispanic ethnicity) [9,10].Thus, analyzing the readability (ie, how difficult a text is to understand) of web-based information about PrEP would help understand the effectiveness of available web-based PrEP information and identify opportunities to improve the understandability of information for people who might benefit from PrEP.
Health literacy is an individual's ability to obtain and understand health information to make informed health decisions [11].The US Department of Health and Human Services (HHS) [12,13], the National Institutes of Health (NIH) [13,14], the American Medical Association (AMA) [15], and the US Food and Drug Administration (FDA) [16] recommend health information for the public be written at the ninth-grade reading level or lower to ensure that most people in the US population (ie, the general public) can make informed health decisions.However, researchers have consistently found evidence that text-based health information resources, including web-based resources, are too complex for the general public [17][18][19][20][21][22][23][24][25].For example, patient education materials for electronic cigarettes [22], mental health [17], pediatrics [23], medical specialties [24], diabetes mellitus [18], clinical orthopedics [19], human papillomavirus immunizations [20], and cardiovascular diseases [21] were found to require higher literacy levels than that recommended by the NIH, AMA, FDA, and HHS.Moreover, health information available from commercially funded sources was significantly more difficult to read than the information available from government-funded sources for some health conditions, such as diabetes, hypertension, depression, high cholesterol, arthritis, asthma, heartburn, obesity, influenza, and erectile dysfunction [26], but not for electronic cigarette information [22].This complexity often led to comprehension errors [27,28].
The readability of web-based PrEP information has also been reported [25], but the previous assessment was limited to 100 unique websites and did not include information circulated on social media.Additionally, the previous assessment did not stratify the assessment, which limits our understanding of how to improve the readability of educational materials.For information related to PrEP use, the increasing complexity of educational materials has been associated with lower uptake of PrEP [29,30].There are other factors that can impact the understandability of PrEP information and accessibility to certain populations.For example, the format of the information (eg, brochure and website), the information source (eg, US government entities and for-profit organizations), and the intended audience could impact the readability of and trust in materials.Thus, different forms and sources of information should be examined for readability; these data should be interpreted in light of intended audiences who might require different levels of complexity.Analyzing updated content is also important because new PrEP delivery methods continue to gain approval and become available to consumers.For example, with the recent approval of injectable PrEP [31], readability studies of PrEP information need to be updated to include this new delivery method.
Assessments of readability should also include information disseminated through social media [32,33], which is an important source of passive information for many young people.However, many recent studies on the readability of health-related content on the web [17][18][19][20][21][22][23][24][25] have either focused solely on information obtained actively through search engines [17,18,[20][21][22]24,25] or within popular websites for consumers [19,23].It is important to examine health information that is obtained both actively via search engines and passively via social media channels.The latter is critical because >80% of Americans use at least one social media platform [34] and major health agencies, such as World Health Organization (WHO) and Centers for Disease Control and Prevention (CDC), regularly communicate via Twitter feeds.Social media content can include news releases as well as discussions on various health topics (eg, sexually transmitted infections, emergency situations [35,36], and preventive measures [33]).
We aimed to improve our understanding of the readability of publicly available web-based and social media-based PrEP information by analyzing it based on its information source, document format, communication method, PrEP modality, and intended audience.Our findings can be used to develop communication strategies that are tailored to the needs of different objectives and improve health literacy and the understandability of PrEP information for the general public.

Data Collection
We collected examples of both passively (eg, social media posts) and actively (information sought through user-initiated web searches) obtained information.

Collecting Passively Obtained Information
For passively obtained information, we collected PrEP-related information exchanged through Twitter.Twitter was chosen because many health organizations currently use Twitter to communicate health messages, including information about PrEP and HIV prevention, to the public [37,38].About 22% of US adults use Twitter (45% of 18-to 24-year-olds); many (42%) check the platform more than once a day [39].Tweets are short, being limited to 280 characters, but often refer readers to longer text information through embedded URLs.Thus, we investigated patient-level information sources (ie, brochures and websites) that were disseminated by URLs in tweets.We assessed and compared the readability of PrEP information based on the method used to obtain information (ie, passively or actively), information source (ie, organization type), document format (eg, brochures and websites), PrEP modality (ie, oral and injectable), and the intended audience (ie, patient, provider, and general).
To assess passively obtained information on PrEP, we used PrEP-related tweets with URLs as a starting point.Using Twitter API [40], we first collected PrEP-related tweets from July 2012, when the FDA first approved the PrEP indication for a pharmaceutical medication [41], to January 31, 2022, based on PrEP-related keywords that were used in a previous study [42] and a new set of keywords related to injectable PrEP [31] (Textbox 1).To emulate how users might search for information about PrEP on the web, we started from the Twitter URLs and further assessed the contents of external URLs that were embedded in the original URL (ie, manual web crawling).For each URL visited, PrEP information was manually and systematically collected (eg, text about PrEP) from the URLs.We repeated this process until all the relevant information linked to the original URLs and their referred external URLs were collected.
PrEP keywords: [(truvada) OR (hiv AND PrEP) OR (preexposure AND prophylaxis AND hiv) OR (pill AND prevent AND hiv) OR (pill AND protect AND hiv) OR (pill AND protect AND AIDS)] [42] Injectable PrEP keywords: [(Apretude) OR (cabotegravir) OR (inject AND prevent AND hiv) OR (inject AND protect AND hiv)]

Collecting Actively Obtained Information
To assess actively obtained information, a systematic search of PrEP was conducted using 5 popular search engines (ie, Google, Yahoo, Bing, DuckDuckGo, and Ask) from April to June 2022.We simulated the behavior of general consumers using the following various search terms and phrases: "PrEP," "PrEP information," "What is PrEP," "How to use PrEP," "Pre-exposure Prophylaxis frequently asked questions," "how does PrEP work in HIV prevention," "PrEP brochure," "PrEP facts," "HIV and PrEP," "Truvada PrEP," "PrEP side effects," "how much does PrEP cost," or "how to take prep" until a search result saturation was reached (ie, new information was not obtained).Then for comparison purposes, we specifically searched for "medication guide," "patient information," "info for people," "information for the patient," "information for the user," and "patient version" of known PrEP medications, (eg, Truvada, Descovy, Apretude, Vocabria, and cabotegravir).
For each of the 5 search engines, search results were collected from the first page for each search engine because most users rarely read past the first page of search results [43] and because our focus was to analyze the most frequently accessed web-based PrEP information.We did not include sponsored listings, as we wanted to focus on the most relevant and unbiased information.We manually verified search results to ensure that we only included web pages with PrEP information.We then systematically and manually web crawled these pages, using links from the search results as a starting point to emulate user behavior.Several false-positive links were excluded from the search results that were related to different acronyms or abbreviations (eg, Professional Research Experience Program).We also excluded documents without any PrEP-related information (eg, content about drugs used only for HIV treatment).We also excluded papers published in peer-reviewed journals that are not intended for the general public and would skew the readability results.Several documents were part of a much bigger document (eg, product monograph and prescribing information).Despite their relevance, if the overall PrEP information document exceeded 32,767 characters, we only included sections on PrEP that were written for the prescription drug prescribers in our analyses.We felt that this would most closely emulate how lay people would read this text.Duplicate sources obtained from more than one search path, educational videos, and descriptive figures and images were excluded and removed, and the web pages were converted to plain text for analysis.

Ethical Considerations
We only analyzed publicly available documents in this study and did not analyze identifiable private information or involve any direct or indirect interactions with individuals.Per UNC Charlotte's policy (citation 45 CFR 46 Definitions), the study is exempt from institutional review board requirements because it does not meet the regulatory definitions of human subject research.

Information Source
We categorized information sources according to the originator of the document.The originator type was first determined with manual Google searches about organizations.This information was supplemented with information from the "about us" section of websites.We also used information in the web-based sources about affiliations, funding sources, and the Crunchbase [44] to finalize their organization types.Several websites still had no explicit indication of their affiliations or funding sources; we have denoted them as "N/A" and categorized them separately in our comparison analyses.
The information sources for this study included national and state governmental agencies (ie, US-located government entities; eg, CDC and California Department of Public Health), non-US governmental agencies (eg, National Health Service of the United Kingdom), other public health organizations (eg, WHO and Joint United Nations Programme on HIV/AIDS), nongovernment organizations (eg, Mayo Clinic), and for-profit organizations (eg, Gilead Sciences).

Document Format and Communication Method
Communication methods were categorized as brochures, information sheets, or websites by manually assessing each document.To identify brochures, we first searched for the keyword "brochure" in the URLs, and then, we manually verified other documents with similar presentations of information.Brochures were distinguished from information sheets by their inclusion of images, figures, or pictures.To identify the information sheets, we searched for the keywords "factsheet" and "sheet" from the remaining URLs.Then, we manually labeled other information sheets by searching for similar presentations of information, which contained only text (ie, no images, figures, and pictorial descriptions).Other resources were categorized as websites because they were hosted on a web page and did not meet the criteria for either brochure or information sheet.

PrEP Modality
All documents were manually categorized based on PrEP modality (ie, delivery methods) to allow comparison by modality.We classified documents with one or more keywords-"oral," "tablet," "pill," "Truvada," "Descovy," "daily PrEP," "PrEP 2-1-1," and "PrEP on demand"-in the title as "oral PrEP."We classified documents with the keywords "Cabotegravir," "Apretude," "long-acting," "injection PrEP," and "injectable" as "injectable PrEP."If a document had both types of keywords in the title, we categorized it as "oral and injectable PrEP."For documents without these keywords in the title, the modality was categorized based on the content of the document.Documents without PrEP modality descriptions were categorized as general PrEP.

Intended Audience
The intended audience was manually determined for each document, that is, patient, provider, or general public.We used document titles and the first paragraph to determine the target audience.To identify documents written for patients, we looked for the following titles: "Patient Information," "Medication Guide," "Patient Medication Information," "Information for the user," "Information for the patient," and "Overview for Patients."Then, we read the first paragraph to ensure its intended audience was patient and checked if it had statements similar to the following."Read this carefully before you start taking [Drug name] and each time you get a refill.[…]."Similarly, documents written for providers were identified if they had these or similar titles: "Clinical Guide," "Provider Frequently Asked Questions," "Provider FAQs," "Clinician Guidance," and "Guide for Medical Providers."We also read the first paragraph to check phrases similar to "Review this guide when prescribing [Drug name]."The intended audience for documents not meeting the criteria for patient or provider documents was determined as intended for the general public.

Measuring Readability
We used the Flesch-Kincaid grade level [45], Simple Measure of Gobbledygook Index [46], Coleman and Liau Index [47], and Automated Readability Index [48] to calculate the text readability, which is defined here as the grade level required to comprehend the text in the US education system.These indices are widely used in previous readability studies [23,24,26,[49][50][51][52][53], and we applied them to the entire data set.To computationally perform readability analysis, the readability scores were first calculated based on the 4 readability indices using the open-source Python textstat package [54].Given that different readability indices can generate a range of results, we used the mean of the 4 readability indices in our analyses to increase the reliability of our results but also report results generated by all 4 indices (Multimedia Appendix 1).We then conducted 5 sets of pairwise independent sample t tests to compare readability scores based on the method of obtaining the information, information source, document format and communication method, PrEP modality, and intended audience.When necessary, P values were adjusted using the prespecified Hommel procedure [55] to account for multiple comparisons within each set.

Information Obtained Method
We collected a total of 463 documents, 194 from Twitter (ie, passively obtained information) and 269 from search engines (ie, actively obtained information).A total of 17 documents were duplicates.For analyses, we removed 1 set of the 17 duplicate documents.The documents were excluded in both sets, a total of 34 documents, when comparing the readability of documents based on methods for obtaining information.Using the average of the 4 reading level indices, the overall average reading level for web-based PrEP materials was a 10.20 grade level with an SE of 0.11.This is significantly above the recommended grade level of lower than ninth grade for the comprehension of health materials.The readability scores of materials retrieved from Twitter (9.8 grade level, SE 0.16) and search engines (10.5 grade level, SE 0.15) were significantly different (P=.002 by independent sample t tests).
We found that the distribution of readability scores for all documents was approximately normally distributed (ie, unimodal) with a slight skew to the right and a mode between the ninth and eleventh reading levels as shown in Figure 1.About 74% (n=328) of web-based sources required reading levels higher than ninth-grade.

Information Source
We found 172 documents from 45 for-profit entities (eg, Gilead Sciences and Nurx), 157 documents from 49 nongovernment organizations (eg, Mayo Clinic and Black AIDS Institute), 88 documents from 28 US governmental entities (eg, CDC and NIH), and 16 documents from 7 non-US governmental entities (eg, National Health Service of the United Kingdom and Health Service Executive-Ireland) or other public health organizations (eg, WHO and Joint United Nations Programme on HIV/AIDS).A total of 13 documents from organizations had no explicit indication of their affiliations or funding sources; they were excluded from our information source comparison analyses.Some organizations and entities developed multiple PrEP-related documents, but they differed in content and topic.For example, some documents provided details about a particular PrEP drug (eg, Truvada and Descovy), while others provided information about accessing PrEP or PrEP cost.Still, others were different document types (eg, brochures).In our analyses, we treated each document as a separate data point.
Complete readability scores for each document are presented in Multimedia Appendix 1.The overall average grade reading level required to understand web-based PrEP information (10.2 grade level, SE 0.1) is considerably higher than the target grade level recommended by health agencies for health information.The mean grade reading levels and SEs (ie, of the 4 indices) from these organizations are displayed below and compared in Table 1.There was no significant difference in readability among organization types.

Document Format and Communication Method
The most common type of document was websites (n=384), including content from health organizations and health care professionals.Websites typically provide a comprehensive range of information on multiple related topics, such as HIV/AIDS and PrEP access, at a reading level of 10.2 grade (SE 0.46).
In our data set, information sheets (n=29) were in PDF file format and had titles containing phrases such as "Patient Medication," "Medication Guide," "Information for the user," "Information sheet," "Fact sheet," "Facts," and "Information booklet."Compared to other document formats, they were also lengthy and detailed, including medication information and instructions for patient use.Most medication guides were written by the drug manufacturers (eg, Gilead Sciences and ViiV Healthcare) and FDA.Five information sheets were provided by nonprofit and non-US government health organizations.Information sheets were written at a reading level of 11.2 grade (SE 0.46).
Examples of each type of document are provided in Figure 2. Brochures (n=33) included user guides and leaflets containing commonly asked questions such as "What is PrEP" and "How to get PrEP."Brochures were prepared by federal and state governmental organizations (n=15), non-US governmental organizations (n=2), nongovernmental organizations (n=10), and for-profit organizations (n=6).The 3 document types had significantly different readability levels: the readability of brochures was easier than both information sheets (adjusted P<.001) and websites (adjusted P=.003), with an average readability level at the recommended ninth grade level (SE 0.31).

PrEP Modality
We identified 220 documents relating to oral PrEP only, 14 documents relating to injectable PrEP only, and 14 documents mentioning both oral and injectable PrEP.The rest of the documents (n=198, 44%) did not mention a specific PrEP delivery method (ie, "general PrEP"); these latter documents were excluded from analyses.On average, the content on injectable PrEP (12.7 grade level, SE 0.9) was more difficult to comprehend than the content on oral PrEP (10.2 grade level, SE 0.2).Information on both forms of PrEP was intermediate in readability (10.8 grade level, SE 0.6).The comparison analysis results are shown in Table 2.

Intended Audience
Most documents (n=426, 95.5%) were categorized as "general public" because they did not specify their intended audience.We found 14 documents specifically written for PrEP users and 6 specifically written for providers.
The readability grade level indices of content intended for PrEP providers (15.6 grade level, SE 1.59) were significantly higher when compared to the content intended for patients (10.2 grade level, SE 0.3; adjusted P<.001) and the general public (10.1 grade level, SE 0.1; adjusted P<.001).The reading level of content written for PrEP users was similar to that of materials for the general public (adjusted P=.84).

Principal Findings
Our study highlights the challenging task of effectively educating diverse audiences about PrEP.Our results are consistent with previous work [25], which found that web-based PrEP information is often difficult to understand.We add to this previous work by including data disseminated in social media and stratifying web-based PrEP documents by how the PrEP document was obtained on the web, information source, document format and communication method, PrEP modality, and intended audience.We found PrEP materials from Twitter were significantly easier to comprehend than those identified by web searches, despite the fact Twitter was merely posting URLs of web-based information.It is possible that organizations selectively identify more comprehendible materials to promote through Twitter.The readability of web-based PrEP information from different organizational groups was all higher than recommended, and there was no significant difference between them (Table 1).This contrasts with a previous study that focused on different health conditions [26].Informational documents intended for providers had significantly higher reading levels than those intended for patients or the general public.However, the materials intended for the patients and the public were still higher than the recommended ninth-grade reading level.
Brochures were the only type of PrEP materials that achieved the HHS and NIH target of ninth-grade reading level [13].Brochures have been used in successful public health campaigns for facilitating behavior change, knowledge increase, and self-efficacy for other health conditions [56].Brochures may be effective means of PrEP communication for most people who are not medical providers.However, brochures were not a common mode of conveying information: there were only 33 brochures in 446 unique documents that we identified.
As PrEP regimens become more complex (eg, injectable PrEP and 2-1-1 PrEP), educational materials might become more difficult to comprehend.For example, we identified that a higher reading level was required to understand materials about injectable PrEP.Yet, there is great interest in injectable PrEP: in a National HIV Behavioral survey of 314 people, injectable PrEP was preferred by 3 times more respondents compared to oral PrEP [31].This calls for attention to improving the readability of materials describing injectable PrEP because difficulty in understanding injectable PrEP information could discourage people who would otherwise be interested in the new PrEP route of administration.
Our findings suggest that there is a need for health communication strategies and policies that support the development and dissemination of clear and concise PrEP information.Our finding suggests that health communication strategies can be enhanced by using brochures and plain language, which may include avoiding jargon.Policies can also be improved to support this effort.For instance, governments could increase funding for research into PrEP literacy and the development of PrEP communication guidelines.Additionally, governments could mandate that health care providers receive training on how to communicate about PrEP in a clear and easy-to-understand way for a variety of audiences.Together, these communication strategies and policies can help individuals to have access to clear and easy-to-understand information about PrEP, which can help to increase PrEP uptake and reduce HIV transmission.

Limitations
Our findings should be understood in the light of their limitations.First, individuals accessing PrEP information on the web may be a select subgroup of those interested in PrEP and might have higher or lower health literacy than those who seek information from other sources.However, given that the internet has become an increasingly popular resource for health information [32,57] and groups potentially eligible for PrEP [58] tend to have high internet use [59], it is likely that a substantial proportion of potential and current PrEP users would seek PrEP information on the web.Second, current general-purpose readability indices alone may not be a perfect measurement of comprehension and reading level [60,61], especially for assessing documents with nontext content, such as charts, graphs, and videos.For instance, we noted that all brochures in our data set contained images.Pictorial and graphic representations of information have been shown to be more effective than text-only messages [62].Similarly, we found that some websites also contain PrEP-related informational videos; videos have been shown to be effective in modifying health behaviors, including promoting HIV testing [63].In this study, we focused on textual information as text remains the primary medium for health communication on the web [64] and relied on the average scores of 4 readability indices to minimize the lack of sensitivity or breadth of measurements from using a single metric.Third, our analysis, although systematic, was not exhaustive.The web-based PrEP information included in our study was collected from a variety of sources, yet it is possible that some important information was missed.We limited our search to English language materials, and the first page of the search result imitated users' web-based behavior [43].Lastly, we did not use generic names of PrEP medications, thus potentially not including PrEP information that only might use the generic names.

Future Directions
Despite public guidelines for writing health education materials, readability levels of web-based health materials, even those written by public health organizations, have either become more difficult [65] or reflected no improvement [19] with few exceptions [66].AMA guidelines suggest addressing health literacy concerns by avoiding unnecessary details and lengthy background information [15].This recommendation is consistent with our finding that documents in the brochure format, which were assessed as most readable, were concise, focused more on patient needs, and provided less background information.Readability is only one measure of the impact of health information, for example, measures of health literacy are evolving to include measures of the extent to which materials increase decision-making ability [67].Thus, future research on health literacy should extend to whether educational materials are helpful to clients in making informed decisions.Social media platforms have been widely used for public health communication and education, but research focused on measuring the effectiveness of social media as a channel for public health information is limited (with a few notable exceptions [68]).A better understanding of social media platforms' effectiveness should improve public health communication and the development of an impactful campaign for facilitating behavior change, knowledge increase, and self-efficacy related to public health needs-including PrEP uptake.Inequitable PrEP uptake persists among racial, ethnic, and gender minority communities [6,7].Follow-up studies should (1) optimize the effectiveness of web-based educational materials in making informed decisions among health equity populations and (2) tailor messaging to address the information needs of specific populations to address inequitable PrEP uptake.
Future studies of the readability of web-based PrEP information should consider the following points.PrEP use and HIV prevention are global concerns, and it is important to understand how people in other languages are accessing and understanding information about PrEP.Future studies should examine materials in other languages.To provide a more comprehensive picture of the range of available web-based PrEP information, future work can expand the data collection process by expanding the number of result pages, search keywords, search engines, and search locations and by applying various filters to narrow and rank the search results.In addition, future work should consider exploring other measures of comprehension, especially for materials that include images, videos, and other nontext elements.We believe that these measures will strengthen the findings and provide a more comprehensive picture of how people are understanding information about PrEP.

Conclusions
The overall reading level of PrEP information found on the web was above that recommended for most potential users; if unaddressed, other challenges to supporting PrEP use by those most likely to benefit might be made more difficult because of a gap in comprehension of web-based information.The readability of web-based PrEP information needs to be improved to comply with health communication recommendations, reduce the barrier of a health literacy gap, facilitate informed decisions by those with a need for PrEP, and counter the HIV epidemic.

Figure 1 .
Figure 1.The empirical distribution of the average readability level required for understanding web-based pre-exposure prophylaxis information.

Figure 2 .
Figure 2. Examples of document types used to disseminate pre-exposure prophylaxis (PrEP) information on the web: information sheet (top left), brochure (top right), website (bottom left), and website in a blog format (bottom right).More information about each document, including the source URL, can be found in Multimedia Appendix 1, rows 12, 272, 114, and 147.For a higher-resolution version of this figure, see Multimedia Appendix 2.

Table 1 .
Pairwise 2 sample t test of the average readability by source type.

Table 2 .
Pairwise 2 sample t test of the average readability by PrEP a modality.
b Versus oral PrEP.cVersus oral and injectable PrEP.