Published on in Vol 11 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/58630, first published .
Population Size Estimation of Men Who Have Sex With Men in Low- and Middle-Income Countries: Google Trends Analysis

Population Size Estimation of Men Who Have Sex With Men in Low- and Middle-Income Countries: Google Trends Analysis

Population Size Estimation of Men Who Have Sex With Men in Low- and Middle-Income Countries: Google Trends Analysis

1Division of Global HIV and TB, Centers for Disease Control and Prevention, 1600 Clifton Rd, Atlanta, GA, United States

2CDC Global Health Fellowship Program, Public Health Institute, Oakland, CA, United States

*all authors contributed equally

Corresponding Author:

Carly M Malburg, MPH


Background: Population size estimation (PSE) for key populations is needed to inform HIV programming and policy.

Objective: This study aimed to examine the utility of applying a recently proposed method using Google Trend (GT) internet search data to generate PSE (Google Trends Population Size Estimate [GTPSE]) for men who have sex with men (MSM) in 54 countries in Africa, Asia, the Americas, and Europe.

Methods: We examined GT relative search volumes (representing the relative internet search frequency of specific search terms) for “porn” and, as a comparator term, “gay porn” for the year 2020. We assumed “porn” represents “men” (denominator) while “gay porn” represents a subset of “MSM” (numerator) in each county, resulting in a proportional size estimate for MSM. We multiplied the proportional GTPSE values with the countries’ male adult population (15‐49 years) to obtain absolute size estimates. Separately, we produced subnational MSM PSE limited to countries’ (commercial) capitals. Using linear regression analysis, we examined the effect of countries’ levels of urbanization, internet penetration, criminalization of homosexuality, and stigma on national GTPSE results. We conducted a sensitivity analysis in a subset of countries (n=14) examining the effect of alternative English search terms, different language search terms (Spanish, French, and Swahili), and alternative search years (2019 and 2021).

Results: One country was excluded from our analysis as no GT data could be obtained. Of the remaining 53 countries, all national GTPSE values exceeded the World Health Organization’s recommended minimum PSE threshold of 1% (range 1.2%‐7.5%). For 44 out of 49 (89.8%) of the countries, GTPSE results were higher than Joint United Nations Programme on HIV/AIDS (UNAIDS) Key Population Atlas values but largely consistent with the regional UNAIDS Global AIDS Monitoring results. Substantial heterogeneity across same-region countries was evident in GTPSE although smaller than those based on Key Population Atlas data. Subnational GTPSE values were obtained in 51 out of 53 (96%) countries; all subnational GTPSE values exceeded 1% but often did not match or exceed the corresponding countries’ national estimates. None of the covariates examined had a substantial effect on the GTPSE values (R2 values 0.01‐0.28). Alternative (English) search terms in 12 out of 14 (85%) countries produced GTPSE>1%. Using non-English language terms often produced markedly lower same-country GTPSE values compared with English with 10 out of 14 (71%) countries showing national GTPSE exceeding 1%. GTPSE used search data from 2019 and 2021, yielding results similar to those of the reference year 2020. Due to a lack of absolute search volume data, credibility intervals could not be computed. The validity of key assumptions, especially who (males and females) searches for porn and gay porn, could not be assessed.

Conclusions: GTPSE for MSM provides a simple, fast, essentially cost-free method. Limitations that impact the certainty of our estimates include a lack of validation of key assumptions and an inability to assign credibility intervals. GTPSE for MSM may provide an additional data source, especially for estimating national-level PSE.

JMIR Public Health Surveill 2025;11:e58630

doi:10.2196/58630

Keywords



The Joint United Nations Programme on HIV/AIDS (UNAIDS) estimated that in 2022, about 39 million people were living with HIV worldwide [1]. HIV burden is higher among men who have sex with men (MSM), people who inject drugs, sex workers, and transgender persons, which together are often described as key populations (KP) [1]. KPs and their paying or nonpaying sexual partners may account for 70% of new HIV infections worldwide, with an estimated 80% of new HIV infections outside sub-Saharan Africa (SSA) and 55% of all new infections within SSA [1,2].

Key population size estimation (PSE) is needed to estimate the number of individuals belonging to a KP in a given geographical area [3,4]. PSEs provide the denominator values to inform KP programming and policy [5]. However, PSE is a difficult field and its methods often lack rigor in design or implementation, and the many methods available reflect the lack of an acceptable gold standard [3,6]. Challenges to PSE include lack of sampling frames, mobility, and nondisclosure of KP-defining behaviors [3,4]. Further, most PSE methods produce local estimates whereas national PSE estimates are often obtained through “expert opinion,” simple projection, or modeling and less often through national-level empirical data such as direct survey questions or the network scale-up method, both used in general population-based surveys [6,7]. Direct survey questions about KP-defining traits experience reporting bias and require a major effort unless they can be added to an already planned general population survey. The frequent lack of reliable national-level PSE constitutes an even larger challenge compared with the availability of local PSE and complicates national, regional, and global HIV estimation work [3,8-10].

The rise of the internet facilitates web-based activities to improve public health, including in the field of digital epidemiology and infoveillance [11]. Recently, a new PSE method using Google Trends (GT) internet search data was proposed in a proof of concept paper by Card et al [12] GT is a free cloud-based app that displays the relative frequency of user-specified Google search terms as trends across time and user-selected geographical areas [12-14]. Card et al [12] used GT and Canadian census data to estimate the local PSE of MSM in urban and rural locations throughout Canada. Card et al [12] related search terms presumed to be representative of MSM (“gay porn”) to that presumed to be representative of the general (male) population (“porn”). By relating these 2 sets of values, Card et al [12] estimated the relative size of MSM in these Canadian towns. To date, no other published PSE exists using this method.

The literature on pornography consumption by sex and sexual orientation is limited and often the MSM population is not represented. However, a major porn website reported that about a third of its visitors globally in 2021 were reportedly women [15,16]. Further, women, regardless of sexual orientation, may also watch gay porn, possibly in substantial numbers [17]. Beyond this, we found no meaningful gray literature or peer-reviewed articles about internet pornography consumption in low- and middle-income countries (LMICs) or pornography consumption by MSM in LMICs. We are also not aware of (gray) literature about the proportion of heterosexual and homosexual men searching Google for (gay) porn in LMICs.

We expanded the literature search to include high-income settings. A study conducted in the United States reported that more men than women consume pornography (92%:68%, respectively) over the span of a year [18]. The study did not report the type of pornography consumed or disaggregate male respondents by sexual orientation or practice [18]. A separate study from Norway with a sample of some 2300 male and female participants suggested that more men than women consume some pornography (94% of men and 68% of women) [19]. However, only 5% (n=106) of participants identified as gay/lesbian/bisexual, no breakdown of sexual orientation by sex was given, and no information on the type of pornography consumed by participants was available [19].

The aim of this study was to examine the potential utility of using GT data to obtain MSM PSE in selected LMICs.


Preliminary Literature Search

A nonsystematic literature search was conducted to better understand the behavior of pornography consumption of the general population and sexual minorities, by sex, as well as the relative frequencies with which these populations search for (gay) porn in general (via Google) or by directly accessing specific porn sites.

Selection of Countries

We analyzed GT data for a selected set of 54 countries that receive support from the US President’s Emergency Plan for AIDS Relief, the US Government’s initiative to support global HIV responses, for which information on MSM PSE has been sought [2,20]. These countries are located in SSA (n=29), Asia (n=13), the Americas (n=11), and Ukraine (Tables 1 and 2).

Table 1. National men who have sex with men (MSM) population size estimation (PSE) for US President’s Emergency Plan for AIDS Relief supported countries (n=53) using Google Trends (GT) data for the year 2020a,b.
Region and countryGT (number of MSM), nGT, %UNAIDSc GAMd regional %, median (IQR)eUNAIDS KPf Atlas, %
East Africag1.67
Burundi48,5001.770.34
Ethiopia365,0001.28h
Kenya276,0001.990.24
Rwanda51,3001.540.15
Tanzania243,0001.730.35
Uganda154,0001.470.23
Southern Africai1.67
Angola106,0001.44
Botswana13,0002.120.43
Eswatini45001.571.38
Lesotho10,0001.711.05
Malawi52,5001.160.94
Mozambique134,0001.870.22
Namibia16,5002.60
South Africa393,0002.461.94
Zambia51,8001.180.15
Zimbabwe53,0001.640.71
West Central Africaj1.28 (IQR 0.45‐1.50)
Benin34,0001.180.20
Burkina Faso88,0001.800.07
Cameroon148,0002.290.11
Cote d’Ivoire166,0002.680.90
DRC33,0001.660.98
Ghana112,0001.400.69
Liberia22,6001.836.04
Mali70,5001.550.09
Nigeria614,0001.260.49
Senegal73,6001.941.38
Sierra Leone27,0001.360.16
Togo53,1002.650.30
Asiak1.63 (IQR 0.26‐3.10)
Burma664,0004.531.72
Cambodia258,0005.671.93
India6,460,0001.180.06
Indonesia1,180,0001.611.03
Kazakhstan137,0002.991.35
Kyrgyz Rep.53,0003.100.99
Lao PDR53,0002.732.96
Nepal83,0001.190.86
PNG31,0001.301.58
Tajikistan52,0002.140
Thailand215,0001.253.08
Philippines1,260,0004.272.33
Viet Nam1,953,0007.460.98
Europe
Ukraine366,0003.482.11 (IQR 1.75‐2.49)1.71
Caribbeanl2.71
Dominican Rep.124,0004.264.90
Guyana82003.601.45
Haiti108,0003.601.03
Jamaica24,0002.915.15
Trinidad and Tobago11,0003.04
Central and South Americam3.37
Brazil2,960,0005.183.50
El Salvador85,0005.203.31
Guatemala245,0005.092.42
Honduras147,0005.321.48
Nicaragua114,0006.321.97
Panama81,0007.232.65

aThese estimates are for descriptive purposes only, to examine issues related to the potential utility of the method proposed by Card et al [12]. They represent the MSM population national population size estimates (percentage of MSM) for the year 2020. The percentage of MSM was calculated by taking the average relative search volume score produced by Google Trends for “gay porn” and dividing it by the average relative search volume score produced by Google Trends for “porn.” MSM population size estimate (number of MSM) was calculated by taking the percentage of MSM population size estimate and dividing it by the total male population (ages 15‐49 years). Key populations (KPs) Atlas percentage of MSM population size estimate was calculated by dividing the absolute MSM population size estimate taken from the United Nations Programme on HIV/AIDS (UNAIDS) KPs Atlas dashboard by the total adult male population (ages 15‐49 years), and then multiplying by 100. The absolute value difference was calculated by subtracting the GT absolute MSM population size estimate value from the KPs Atlas MSM population size estimate absolute value. All absolute values under 10,000 are rounded to the nearest 100. All other absolute values are rounded to the nearest 1000. UNAIDS Global AIDS Monitoring system (GAM) values are regional values transcribed from the UNAIDS open-source Spectrum 6 guide. The countries used to create these regions and respective values may not be in full alignment with the countries included in the population size estimate analysis, therefore direct 1:1 comparisons should not be made. Max:Min ratio: The ratio based on the largest and smallest PSE % value in each region.

b Absolute values are not provided as Google Trends does not provide absolute search frequency values.

cUNAIDS: United Nations Programme on HIV/AIDS.

dGAM: Global AIDS Monitoring system.

eIQR values were included for available regions. Regions without an IQR listed did not have one available.

fKP: key population.

gMax:Min ratio: 1.6 (GT) and 2.3 (UNAIDS KP).

hNot available (data missing for the country).

iMax:Min ratio: 2.2 (GT) and 12.9 (UNAIDS KP).

jMax:Min ratio: 2.3 (GT) and 86.3 (UNAIDS KP).

kMax:Min ratio: 6.3 (GT) and 51.3 (UNAIDS KP).

lMax:Min ratio: 1.5 (GT) and 5 (UNAIDS KP).

mMax:Min ratio: 1.2 (GT) and 2.4 (UNAIDS KP).

Table 2. Regional median Google Trends Population Size Estimate, United Nations Programme on HIV/AIDS (UNAIDS) Global AIDS Monitoring system (GAM), and key populations (KP) Atlas for men who have sex with men (MSM) populations for the year 2020.a
RegionMedian regional percentage MSM population size estimationb
GTc, %UNAIDS GAM, %UNAIDS KP Atlas, %
Eastern Africa1.641.670.24
Southern Africa1.681.670.83
West Central Africa1.731.280.40
Asia2.861.63d
Europe2.862.111.47
Caribbean3.602.713.17
Central & South America5.263.372.54

aAbsolute values are not provided as Google Trends does not provide absolute search frequency values.

bGoogle Trends (GT) and KP Atlas regional estimates only include estimates from included countries with available data (Table 1). UNAIDS GAM data separate regions differently and include countries that vary from our GT or the KP Atlas regional data: UNAIDS GAM includes eastern and southern Africa in 1 estimate and separates Asia and Europe into 2 estimates (1.63% for Asia and the Pacific, 2.11% for Eastern Europe and Central Asia). Region names were not adjusted in the above table to align with GAM data.

cGT: Google Trends.

dNot available.

Ethical Considerations

No ethics or review board approval or informed consent was obtained or applicable for this work. All data used in this paper are anonymous, aggregate, and publicly available and sourced.

GT-Based Population Size Estimation

GT provides results based on exact search terms, unlike the “topical” search results that Google’s main search engine provides. GT does not provide absolute search frequency values; instead, GT offers relative search volume (RSV) values across time (eg, 52 wk) in a specified space (eg, Kenya), ie, it normalizes search frequencies for specific search terms (eg, porn) to a range from 0 to 100, where a search term’s maximum frequency (for the specified geographic area and during the specified time frame) is set at 100 and 0 reflects no search for that term [11,13,14]. Importantly, GT allows users to add “comparator” terms (eg, gay porn) next to the main term (eg, porn); the RSV values for such comparator terms are normalized against the main term’s RSV values [13,21]. For the purpose of PSE calculation, the main term “porn” may represent all men whereas the comparator term “gay porn” may be viewed as a subset of men who represent the subpopulation of gay men or MSM. To generate an MSM PSE from the RSV values we divide the comparator RSV value (gay porn) by the larger same-time, same-place RSV value (porn).

National Size Estimates

PSE data collection was carried out through GT’s application [13]. We applied this analytic approach for the year 2020 using “porn” and “gay porn” as the main and comparator search terms for each of the 54 countries. The time period for data collection was set as the year 2020, the most recent year for which we could obtain all necessary data for this analysis. Weekly RSV values for “porn” and “gay porn” for the year 2020 were exported, summed, and proportional size estimates obtained. For example, for Botswana, the average of the weekly RSV values for “porn” was 78.3, the corresponding average for “gay porn” was 1.66 and the proportional PSE was therefore calculated as 1.66/78.3=2.1%. This was repeated for all countries. We then calculated the absolute Google Trends Population Size Estimate (GTPSE) by multiplying the proportional GTPSE by the total male population aged 15‐49 years in each country, the most used age range for KPs. The sizes for countries’ 15‐49 year-old male general population in 2020 were obtained through Spectrum (version 6.1, Avenir Health).

Local Size Estimates

GT data can be restricted to subnational areas. Separately from national estimates, for each country, we also attempted to obtain local GTPSE for the political (or, if different, commercial) capital city. Where data were unavailable for the political or commercial capital city, we used data from the district that contained the capital city. The calculation to obtain relative GTPSE was then the same as for the national level. We did not produce absolute subnational GTPSE.

Consistency of GTPSE Results With WHO-Recommended Minimum Estimate

We assessed whether the GTPSE results met the World Health Organization (WHO) and UNAIDS recommendation that national MSM PSE should represent at least 1% of the general adult male population [22,23].

Comparability

We compared the country-level GTPSE against 2 reference data sources used by UNAIDS: the KP Atlas database and the Global AIDS Monitoring system (GAM) [22,24,25]. The KP Atlas database stores countries’ self-reported absolute MSM size estimates using a wide range of PSE methods, often projected up to national scale from local estimates, with primary data collected over different periods of time. Proportional KP Atlas PSE values were computed by dividing the absolute MSM PSE values from the KP Atlas over the male general population (15‐49 years). UNAIDS’ GAM is a global data warehousing system that informs policy and facilitates monitoring, including KP size estimates. Using GAM data, UNAIDS curated a table with regional relative MSM PSE (median and IQR) deemed reasonable.

Covariates Potentially Affecting GTPSE

Overview

We examined the potential effect of select covariates on the relative GTPSE values by performing regression analysis for each covariate. The country-specific covariates we examined included internet penetration, urbanization, stigma, and criminalization of homosexuality. The covariate data are provided in Table S1 in Multimedia Appendix 1; these data were not used to adjust GTPSE values.

Internet Penetration and Urbanization

Internet penetration data were extracted from the World Development Indicators database through the World Bank and the Internet World Statistics database, indicating the percentage of each country’s total population with access to the internet. Urbanization data were obtained from the World Development Indicators database through the World Bank, indicating the percent of the total population in each country considered urban [26,27].

Stigma

Country-level stigma values were extracted from the Global Acceptance Index [28]. This index was developed using computer modeling informed by responses to questions that measure attitudes toward lesbian, gay, bisexual, transgender, or intersex people from 11 different global surveys to create a stigma score in 175 countries toward lesbian, gay, bisexual, transgender, or intersex persons. The system scores countries on a scale of 1 to 10; higher scores indicate less stigma [28].

Criminalization

The State-Sponsored Homophobia International Lesbian, Gay, Bisexual, Trans, and Intersex Association report was used to evaluate the effects of criminalization of homosexual orientation or behavior on GTPSE [28]. The report classifies countries based on their level of legal protection or criminalization of sexual orientation and same-sex sexual acts. These classifications, ranging from most severe to most protected, include the death penalty, up to lifelong imprisonment, up to 8 years imprisonment, de facto criminalization, no criminalization or legal protections, limited protections, employment protections, broad protections, and constitutional protections. We converted these classifications into a quantitative ranking ranging from +4 to −4. The most severe classification (death penalty) was assigned the rank value “+4” and descended to the least severe/most protective classification (constitutional protection) with a rank value “−4.”

Sensitivity Analysis

Using a subset (n=14) of the 53 countries we performed 3 sensitivity analyses at the national level. The 14 countries were randomly selected among countries with prominent languages being French, Spanish, or Swahili. The first sensitivity analysis probed the effect of select non-English search languages. The 14 countries comprised 4 using Swahili (Kenya, Tanzania, Uganda, Democratic Republic of Congo [DRC]), 5 using French (Cote d’Ivoire, Senegal, Cameroon, Mali, Haiti), and 5 using Spanish (Dominican Republic, Panama, El Salvador, Nicaragua, Honduras) as their national/dominant language. We generated GTPSE using search terms in Swahili (“ngono” and “ngono za mashoga”), French (“porno” and “porno gay”), and Spanish (“porno” and “porno gay”) and compared them to the original relative GTPSE values. Using the same 14 countries, the second sensitivity analysis probed the effect of different search terms in English on GTPSE, that is, “sex,” ”gay sex” as well as “sex,” ”anal sex” and compared them to the original GTPSE (porn and gay porn). The third sensitivity analysis probed the effect of using different calendar years, ie, (2019 [pre-COVID] and 2021) and compared them to the original 2020 GTPSE values, using the original English language search terms.


GTPSE and Comparability

Of the 54 countries examined, 1 (South Sudan), was omitted for lack of RSV values. All remaining 53 countries had GTPSE exceeding 1% (Table 1), similar to GAM values (all exceeding 1% as well) and compared with KP Atlas values where 24 out of 53 (45%) countries showed values above 1%. GTPSE ranged from 1.16% to 7.46% (median 1.99%, IQR 1.54%‐3.48%), compared with 0.06% to 6.04% (median 0.99%, IQR 0.34‐1.93%) in the KP Atlas, and 1.38% to 2.82% in GAM regions. In 48 out of 53 (91%) countries, relative GTPSE exceeded estimates in the KP Atlas values; KP Atlas values were larger in 5 countries (DRC, Liberia, Lao People’s Democratic Republic [PDR], Thailand, and Jamaica). Absolute differences between GTPSE and KP Atlas ranged from −312,900 (Thailand) to 6,221,800 (India). Table 2 displays regional median GTPSE, ranging from 1.64% (East Africa) to 5.26% (Central/South America), larger in all regions than the corresponding KP Atlas values and largely similar to GAM values in most regions. Table 1 also displays the ratios between the largest and smallest country-level %PSE for each region, separately for GT and KP Atlas values. While substantial variability is seen in all regions and for both data sources (GT and KP Atlas), in all regions the observed heterogeneity was consistently higher for KP Atlas values compared with GT values.

Local GTPSE pertaining to political or commercial capitals or the larger sub-national areas encompassing these are displayed in Table 3. We could obtain local estimates for 51 out of 53 (96%) countries’ capital cities; GT did not provide data for Nairobi (Kenya) and Kathmandu (Nepal). Among the 51 cities with estimates, the GTPSE ranged from 0% to 13% (median 2.2%); most cities’ estimates (44/51, 86%) exceeded 1%. Five cities yielded noncredible GTPSE values of 0%, including Bujumbura (Burundi), Dodoma (encompassing Dar es Salaam, Tanzania), Ouagadougou (Burkina Faso), Monrovia (Liberia), and Vientiane (Laos PDR). Of the 44 subnational GTPSE with values >1%, 18 (41%) were below the same-country national GTPSE.

Table 3. Reported local men who have sex with men (MSM) Google Trends Population Size Estimate (GTPSE) (n=53) in the year 2020.a
Region and countryLocal areabRelative national GTPSE, %Relative local GTPSE, %Absolute percentage difference national and local GTPSE, %
East Africa
BurundiBujumbura1.770−1.77
EthiopiaAddis Ababa1.281.300.02
KenyaNairobi1.99c
RwandaKigali1.541.700.16
TanzaniaDodoma1.730−1.73
UgandaKampala1.471.600.13
Southern Africa
AngolaLuanda1.442.040.60
BotswanaGaborone2.120−2.12
EswatiniMbabane1.571.48−0.09
LesothoMaseru1.711.01−0.70
MalawiLilongwe1.162.241.08
MozambiqueMaputo1.872.040.17
NamibiaWindhoek2.602.53−0.07
South AfricaJohannesburg (Gauteng)2.460.99−1.47
ZambiaLusaka1.181.560.38
ZimbabweHarare1.641.56−0.08
West Central Africa
BeninLittoral (Cotonou)1.184.112.93
Burkina FasoCentre (Ouagadougou)1.800−1.80
CameroonLittoral (Douala)2.292.470.18
Cote d’IvoireAbidjan2.681.01−1.67
DRCKinshasa1.662.040.38
GhanaAccra1.401.420.02
LiberiaMonrovia1.830−1.83
MaliBamako1.552.931.38
NigeriaAbuja (Federal Capital Terriorty)1.261.440.18
SenegalDakar1.942.850.91
Sierra LeoneFreetown1.361.01−0.35
TogoLome2.652.04−0.61
Asia
BurmaYangon (Yangon Region)4.534.790.26
CambodiaPhnom Penh5.675.43−0.24
IndiaNew Delhi (Uttar Pradesh)1.181.15−0.03
IndonesiaJakarta1.612.200.59
KazakhstanAlmaty (Almaty Region)2.995.522.53
Kyrgyz Rep.Bishkek3.103.09−0.01
Lao PDRVientiane2.730−2.73
NepalKatmandu/Kantipur1.19
PNGPort Moresby1.301.01−0.29
TajikistanDushanbe2.141.01−1.13
ThailandBangkok1.253.241.99
PhilippinesManila4.275.511.24
Viet NamHanoi7.464.56−2.90
Europe
UkraineKyiv3.484.140.66
Caribbean
Dominican Rep.Santo Domingo4.263.99−0.27
GuyanaGeorgetown3.603.09−0.51
HaitiPort-au-Prince3.603.09−0.51
JamaicaKingston (St. Andrew Parish)2.913.380.47
Trinidad and TobagoPort of Spain3.04139.96
Central and South America
BrazilSão Paulo (State of São Paulo)5.185.920.74
El SalvadorSan Salvador5.205.730.53
GuatemalaGuatemala City (Guatemala Department)5.094.91−0.18
HondurasTegucigalpa (Comayagua)5.328.132.81
NicaraguaManagua6.325.31−1.01
PanamaPanama City7.237.440.21

aAbsolute values are not provided as Google Trends does not provide absolute search frequency values.

bLocal MSM GTPSE for 53 countries for the year 2020 was calculated by restricting the geographic entity to the desired capital city or commercial hub. Where Google Trends (GT) did not provide data for a given city, we substituted the place name with the largest city by population or by district that had data available in GT. This is noted by listing what was available in GT in parenthesis next to the capital city. Kenya and Nepal were excluded from this analysis due to insufficient regional data available in GT.

cNot available (data missing for that country).

Effect of Covariates

Figure 1A-D displays the correlations between national-level GTPSE and urbanization, internet penetration, stigma, and criminalization. Coefficients ranged from 0.01 (criminalization) to 0.28 (internet penetration).

Figure 1. The linear relationship between the Google Trends national population size estimates and the rate of urbanization in each country (n=53). (A) The linear relationship between the Google Trends national population size estimates and the rate of urbanization in each country (n=53). (B) The linear relationship between the Google Trends national population size estimates and the rate of internet penetration in each country (n=53). (C) The linear relationship between the Google Trends national population size estimates and the level of stigma against LGBTQ+ persons in each country (n=53). (D) The linear relationship between the Google Trends national population size estimates and the degree of criminalization against men who have sex with men population in each country (n=53). LGBTQ+: lesbian, gay, bisexual, transgender, queer, and other identities; MSM: men who have sex with men; GTPSE: Google Trends Population Size Estimate.

Sensitivity Analysis

Table 4 displays how the GTPSE generated from the alternative search terms compares to the original search term GTPSE. In most countries “Porn/Gay Porn” produced higher PSE values compared with “sex/anal sex” (13/14, 93%) as well as compared with “sex/gay sex” (12/14, 86%). For “sex/gay sex,” all 14 countries produced estimates exceeding 1%. For “sex/anal sex”, 3 out of 14 (21%) countries did not produce estimates reaching the 1% threshold, including Mali for which zero search results were reported for “anal sex.”

Table 4. Sensitivity analysis using alternative search terms in Google Trends to calculate national population size estimations (PSEs) for select US President’s Emergency Plan for AIDS Relief countries (n=53) in 2020.a
Original GTPSEbSA alternate search term GTPSEc
Porn/gay porn PSESex/gay sex PSEAbsolute percentage differenceSex/anal sex PSEAbsolute percentage difference
Country, %
Kenya1.991.370.621.370.62
Tanzania1.731.460.273.54−1.81
Uganda1.471.380.091.260.21
DRC1.661.550.111.150.51
Cameroon2.291.281.010.901.39
Mali1.551.65−0.1001.55
Cote d’Ivoire2.681.900.781.740.94
Senegal1.941.500.440.881.06
Haiti3.602.6012.830.77
Dominican Rep.4.263.360.901.832.43
Panama7.235.172.063.713.52
El Salvador5.205.34−0.144.191.01
Nicaragua6.327.10−0.784.821.50
Honduras5.324.850.472.962.36
Median (IQR)2.49 (1.78-4.97)1.78 (1.47-4.48)0.45 (0.10-0.87)1.79 (1.18-3.40)1.03 (0.66-1.54)

aAbsolute values are not provided as Google Trends does not provide absolute search frequency values.

bGTPSE: Google Trends Population Size Estimate.

cAlternative search terms were chosen based on words that represented the general male population and men who have sex with men subset population in each country (n=53) in the year 2020.

Table 5 shows how GTPSE was generated using alternative language terms compared with the original GT search terms. For Swahili, only 1 country yielded a PSE in that language. All countries using French (n=5), or Spanish (n=5) search terms yielded estimates, all exceeding 1%. All alternative language estimates were lower than the original “porn/gay porn” PSE values.

Table 5. Sensitivity analysis using alternate national language searches in Google Trends to calculate national population size estimation for select US President’s Emergency Plan for AIDS Relief countries (n=14) in 2020.a
Language and countryOriginal GTPSEb (English), %Alternate language term GTPSE, %cAbsolute percentage difference, %
Swahili
Kenya1.9901.99
Tanzania1.730.521.21
Uganda1.4701.47
DRC1.6601.66
French
Cameroon2.291.360.93
Mali1.551.070.48
Cote d’Ivoire2.681.351.33
Senegal1.941.280.66
Haiti3.602.231.37
Spanish
Dominican Rep.4.262.561.70
Panama7.235.142.09
El Salvador5.204.360.84
Nicaragua6.324.132.19
Honduras5.324.071.25

aAbsolute values are not provided as Google Trends does not provide absolute search frequency values.

bGTPSE: Google Trends Population Size Estimate.

cAlternative language search terms included “ngono/ngono za mashoga” (Swahili), “porno/porno gay” (French), “porno/porno gay” (Spanish).

Table 6 displays how GTPSE generated for alternative years (2019 and 2021) compared with the original 2020 GT searches. All 14 countries in both years produced estimates exceeding 1%. No large discrepancies in PSE between the years were observed; 13 out of 14 in 2019 values were larger than the 2020 values whereas the 2021 values were largely similar to the 2020 values.

Table 6. Sensitivity analysis for men who have sex with men population size estimates for select US President’s Emergency Plan for AIDS Relief supported countries (n=14) using Google Trends data in years 2019 and 2021 compared with the year 2020.a, b
2019 PSEc, %2020 PSE, %2021 PSE, %
Kenya2.371.991.99
Tanzania1.961.731.85
Uganda1.731.471.69
DRC1.951.661.60
Cameroon2.702.292.25
Mali2.301.552.17
Cote d’Ivoire2.522.682.23
Senegal2.541.941.90
Haiti4.333.602.92
Dominican Republic4.914.264.34
Panama9.367.236.74
El Salvador6.195.204.77
Nicaragua7.316.324.93
Honduras6.795.325.51

a2019 and 2021 values were computed in the same way as the reference 2020 estimates.

bAbsolute values are not provided as Google Trends does not provide absolute search frequency values.

cPSE: population size estimation.


Principal Findings

Our analysis suggests that national-level MSM GTPSE is feasible in almost all countries. Importantly, all estimates appeared plausible, that is, they exceeded the WHO/UNAIDS suggested minimum threshold of 1%. Heterogeneity of GTPSE across same-region countries was pronounced within all regions yet smaller than the ratios based on the UNAIDS KP Atlas values which contained numerous PSE values well below the 1% threshold.

Our analysis draws on several strengths. We successfully applied the GTPSE method to many low and middle-income countries, suggesting that GTPSE appears to have wide geographic applicability. We compared the values against 2 PSE data sources at UNAIDS, assessed the potential effect of various covariates on GTPSE values, and conducted a sensitivity analysis with varying English search terms, non-English search languages, and different calendar years. Google is the dominant search engine in all countries covered in this analysis, with a market share ranging between 84% and 99% (data shown in Table S1 in Multimedia Appendix 1) [27]. Although no absolute search volume data were available to us, searches for “porn” globally were among the top 20 search terms in 2023 with about 65 million searches globally each month according to one source [29] although this is still well behind the largest porn site-specific searches. GTPSE may emerge as another example of digital public health and epidemiology that includes real-time surveillance of disease outbreaks [30], assessing the impact of global public health days [31], informing health and health policy research [32], or understanding spatiotemporal patterns of dry eye disease [33].

While most local estimates were plausible (>1%), 14% (n=7) did not reach the WHO/UNAIDS minimum threshold, and 2 more locations did not produce a GTPSE value at all due to lack of GT data and how GT organized the subnational data despite some of the affected cities’ large population sizes. This is not an uncommon finding, as other PSE methods in active use typically do not meet the WHO/UNAIDs minimum threshold. For a few other country or commercial capital cities with no direct GT data available, such as Johannesburg (South Africa), we could obtain a subnational estimate using the larger district or province within which the city (eg, Johannesburg and Pretoria) are located. This may limit the utility and comparability of such local estimates. About one-third of the local (relative) estimates did not reach or exceed the same country national level estimates, somewhat contrary to our expectation that rural-to-urban migration among MSM may be more pronounced than that of other men and so yielding higher GTPSE values [9]. In Card et al’s [12] study on Canadian towns and cities the estimates ranged from 2% to 4% compared with 0% to 13% among the local estimates, whereas the Canadian national estimate was 2.8% compared with 1.2%‐7.5% across all countries we examined. While not a limitation, it is worth noting that weekly RSV data varied widely (data not shown), confirming the recommendation to use GT data for size estimation only over longer time periods, such as a full calendar year.

Limitations

Like most PSE methods, GTPSE has limitations. In particular, the assumptions underlying the GTPSE method deserve close scrutiny: straight men only search for porn, MSM only search for gay porn, MSM and straight men search for (gay) porn in equal proportions, and women do not search for (gay) porn at all or do not affect the generated GTPSE for MSM. Violations of these assumptions will result in bias if they affect RSV for porn and gay porn to differing extents, hence altering the proportion of porn searches that are directed at gay porn. While the literature from LMIC settings on this topic is very sparse, reports and literature from high-income settings suggest that gay porn is also consumed by heterosexual men and women, suggesting that some bias may be present. Complicating speculations about the magnitude and direction of bias is the fact that specific porn websites’ user statistics may not accurately reflect searches for (gay) porn on Google. Women’s search behavior on Google regarding gay porn may increase or decrease the GTPSE estimates depending on the frequency relative to searches for just porn.

Regrettably, Google does not provide access to its algorithm generating the RSV data nor can users filter GT searches by age or gender. An inherent limitation in using GT data includes the lack of deduplication in the search data (although repeated searches by the same user within a short time period are not counted multiple times by Google) and the lack of absolute search volume data. Not having access to the absolute search volume data impedes the computation of uncertainty intervals (which in most national settings may be expected to be small due to the large search volumes involved). However, absolute search volume information may eventually be made available by Google and is already offered to some extent by select third-party companies. Absolute search volume data may also inform the choice of search language and even search terms and may facilitate composite GTPSE metrics by incorporating multiple GTPSE metrics stemming from different language search terms. Restricting GTPSE-relevant data to male users may further refine GTPSE values by excluding female users, a limitation our analysis could not overcome. VPN (virtual private network) also has the potential to introduce errors if users select a country other than their place of residence. The adoption of VPN may vary considerably across time and by country, and, among US President’s Emergency Plan for AIDS Relief countries. According to one industry website in 2020, VPN was highest in Ukraine (7.9%) and lowest in Kenya (0.5%) [34]. Taken together, these limitations constitute a major source of uncertainty about the bias and precision of GTPSE. For that reason, GPTSE should be regarded as an approximate reference value. Clearly, they do not attain the rigor or transparency of statistically principled estimation from accurately measured data, which the currently best available PSE methods do offer. Additionally, GTPSE may not be feasible for a few countries, perhaps due to poor or little data availability on search terms and frequencies.

GTPSE seems infeasible for size estimation among transgender persons, sex workers, or people who inject drugs. Unlike (gay) porn, where the search is about a web-based product (visual depictions of porn), searches for sex work or clients, transgenderism, or injecting drug use are not directly tied to the internet, and may exhibit a more variable search terminology, and may lack fitting “denominator” search terms (analogous to “porn”).

Overall, the GTPSEs often were substantially higher than the KP Atlas estimates but were more closely aligned with the reported GAM regional estimates. The KP Atlas estimates are based on a broad range of PSE methods typically generating local PSE that may or may not be projected to national scale, or summed or averaged across multiple localities, and may refer to various time points (calendar years) and various age ranges. Many KP Atlas based MSM PSE were implausibly low (<1%), suggesting that substantial differences to GTPSE may often be due to KP Atlas underestimates. The regional GAM estimates are based on a more curated database of PSE after excluding estimates with subpar quality and hence of perhaps more trustworthy quality [22]. However, GAM regions do not exactly overlap with the regions we used for GTPSE and the KP Atlas estimates.

The national MSM GTPSE values were robust against varying levels of urbanization, internet penetration, stigma, and criminalization or protection of homosexuality, negating the need for adjustment and increasing comparability across different settings. The largest influence was seen with internet penetration which can be expected to increase over time. In the sensitivity analysis, the largest differences to the original GTPSE values were seen using alternate English language search terms. Among the 14 examined countries, almost half (43%) of the alternate estimates were below 1% and hence considered implausibly low. This indicates that search term selection is important, especially for comparison across time and space. Further exploration may be warranted to evaluate if country or region-specific English or non-English slang terms may produce plausible estimates; however, the limited sensitivity analysis suggests that “Porn/Gay Porn” may be dependable and consistently produces plausible values. The use of similar search terms in French, Spanish, and Swahili yielded universally lower results; Swahili, not a nationally dominant language in most countries, appears particularly unsuitable as it frequently produced 0% PSE values. As most countries display prominent non-English language use, countries may want to consider using the predominant language (used for web searches) when applying this method while considering any language’s geographic scope in-country. The results also appeared robust across time (two years affected by the COVID pandemic plus 1-year pre-COVID) as the 2 adjacent years produced plausible and (same country) consistent results. The lack of uncertainty intervals however impeded a more meaningful interpretation of the results from the sensitivity analyses.

Conclusions

Generating national-level PSEs for KPs is challenging for many countries. GTPSE is a simple method with the potential to address this problem efficiently without the need of additional resources. However, the lack of validation of key assumptions and the inability to generate credibility intervals suggest important uncertainty regarding the accuracy and precision of the estimates. Additional research, such as expanding or building on our sensitivity and covariate analysis, to address or better understand these limitations may further improve the quality and utility of GTPSE for MSM in LMICs.

Acknowledgments

We wish to acknowledge all authors who contributed to this paper. WH and AAQ contributed to the study concept and design. All authors substantially contributed to the study’s conduct, or to data analysis and interpretation. All authors contributed to writing, reviewing, or editing the manuscript. We also wish to acknowledge all colleagues within the Center for Disease Control and Prevention who offered insights and alternative perspectives during the analysis and writing of this paper.

Disclaimer

The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the funding agencies. This research publication has been supported by the President's Emergency Plan for AIDS Relief (PEPFAR) through the Centers for Disease Control and Prevention (CDC) under the terms of project number 0900f3eb820619c0. This publication was also supported by Cooperative Agreement number NU2GGH002093 between the Centers for Disease Control and Prevention and the Public Health Institute.

Data Availability

All data generated or analyzed during this study are publicly accessible data (with the exception that only aggregate data can be obtained through Google Trends). Links to where the data can be accessed are included in this published article and its supplementary information files. Further, all relative search volume data points can be reproduced through Google Trends.

Conflicts of Interest

None declared.

Multimedia Appendix 1

Supplementary Table 1.

XLSX File, 18 KB

  1. Global HIV & AIDS statistics — fact sheet. UNAIDS. URL: https://www.unaids.org/en/resources/fact-sheet [Accessed 2024-12-31]
  2. Indicators and questions for monitoring progress on the 2021 Political Declaration on HIV and AIDS — Global AIDS Monitoring 2025. UNAIDS. 2024. URL: https://www.unaids.org/en/resources/documents/2024/global-aids-monitoring-guidelines [Accessed 2024-12-31]
  3. Neal JJ, Prybylski D, Sanchez T, Hladik W. Population size estimation methods: searching for the holy grail. JMIR Public Health Surveill. Dec 3, 2020;6(4):e25076. [CrossRef] [Medline]
  4. Stone J, Mukandavire C, Boily MC, et al. Estimating the contribution of key populations towards HIV transmission in South Africa. J Int AIDS Soc. Jan 2021;24(1):e25650. [CrossRef] [Medline]
  5. Viswasam N, Lyons CE, MacAllister J, et al. The uptake of population size estimation studies for key populations in guiding HIV responses on the African continent. PLoS One. 2020;15(2):e0228634. [CrossRef] [Medline]
  6. Xu C, Jing F, Lu Y, et al. Summarizing methods for estimating population size for key populations: a global scoping review for human immunodeficiency virus research. AIDS Res Ther. Feb 19, 2022;19(1):9. [CrossRef] [Medline]
  7. Teo AKJ, Prem K, Chen MIC, et al. Estimating the size of key populations for HIV in Singapore using the network scale-up method. Sex Transm Infect. Dec 2019;95(8):602-607. [CrossRef] [Medline]
  8. Sabin K, Zhao J, Garcia Calleja JM, et al. Availability and quality of size estimations of female sex workers, men who have sex with men, people who inject drugs and transgender women in low- and middle-income countries. PLOS ONE. 2016;11(5):e0155150. [CrossRef] [Medline]
  9. Edwards JK, Hileman S, Donastorg Y, et al. Estimating sizes of key populations at the national level: considerations for study design and analysis. Epidemiology. Nov 2018;29(6):795-803. [CrossRef] [Medline]
  10. Bozicevic I, Sharifi H, Haghdoost A, Sabry A, Hermez J. Availability of HIV surveillance data in key populations in the countries of the World Health Organization Eastern Mediterranean Region. Int J Infect Dis. Aug 2022;121:211-216. [CrossRef] [Medline]
  11. Mavragani A, Ochoa G. Google Trends in infodemiology and infoveillance: methodology framework. JMIR Public Health Surveill. May 29, 2019;5(2):e13439. [CrossRef] [Medline]
  12. Card KG, Lachowsky NJ, Hogg RS. Using Google Trends to inform the population size estimation and spatial distribution of gay, bisexual, and other men who have sex with men: proof-of-concept study. JMIR Public Health Surveill. Nov 29, 2021;7(11):e27385. [CrossRef] [Medline]
  13. Porn, gay porn - explore. Google Trends. URL: https://trends.google.com/trends/explore?date=2020-01-01%202020-12-31&geo=DZ&q=Porn,Gay%20Porn [Accessed 2024-12-31]
  14. FAQ about google trends data - trends help. Google. URL: https://support.google.com/trends/answer/4365533?hl=en [Accessed 2024-12-31]
  15. How many women watch porn? Fight the New Drug. URL: https://fightthenewdrug.org/how-do-men-and-womens-porn-site-searches-differ [Accessed 2024-12-31]
  16. Pornhub stats reveal women search for more hardcore genres than you might expect. Fight the New Drug. URL: https://fightthenewdrug.org/data-reveals-women-are-searching-hardcore-genres [Accessed 2024-12-31]
  17. When no one is looking, many women are watching gay porn. NBC News. URL: https://www.nbcnews.com/feature/nbc-out/when-no-one-looking-many-women-are-watching-gay-porn-n894266 [Accessed 2024-12-31]
  18. Solano I, Eaton NR, O’Leary KD. Pornography consumption, modality and function in a large internet sample. J Sex Res. Jan 2, 2020;57(1):92-103. [CrossRef]
  19. Træen B, Daneback K. The use of pornography and sexual behaviour among Norwegian men and women of differing sexual orientation. Sexologies. Apr 2013;22(2):e41-e48. [CrossRef]
  20. PEPFAR 2022 country and regional operational plan (COP/ROP) guidance for all PEPFAR-supported countries. US Department of State. PEPFAR; 2022. URL: https://www.state.gov/wp-content/uploads/2022/02/COP22-Guidance-Final_508-Compliant-3.pdf [Accessed 2025-01-02]
  21. Dangerous inequalities: world AIDS day report 2022. UNAIDS. 2022. URL: https://www.unaids.org/sites/default/files/media_asset/dangerous-inequalities_en.pdf [Accessed 2025-01-02]
  22. Quick start guide for Spectrum. UNAIDS. 2020. URL: https://www.unaids.org/sites/default/files/media_asset/QuickStartGuide_Spectrum_en.pdf [Accessed 2025-01-02]
  23. World Health Organization. Recommended population size estimates of men who have sex with men. UNAIDS. 2020. URL: https:/​/www.​unaids.org/​sites/​default/​files/​media_asset/​2020-recommended-population-size-estimates-of-men-who-have-sex-with-men_en.​pdf [Accessed 2025-01-02]
  24. The Key Populations Atlas. UNAIDS. 2022. URL: https://kpatlas.unaids.org/dashboard [Accessed 2024-12-31]
  25. Key Populations Atlas data sources. UNAIDS - Key Populations Atlas. 2017. URL: https://kpatlas.unaids.org/document/kp_data_sources.pdf [Accessed 2025-01-02]
  26. Internet world stats - usage and population statistics. ResearchGate. URL: https:/​/www.​researchgate.net/​publication/​258847361_Internet_World_Stats_-_Usage_and_Population_Statistics [Accessed 2025-01-02]
  27. World development indicators. World Bank Group. URL: https://databank.worldbank.org/source/world-development-indicators [Accessed 2024-12-31]
  28. Mendos LR, Botha K, Lelis RC, de la Peña EL, Savelev I, Tan D. State-Sponsored Homophobia report. ILGA World. 2020. URL: https://ilga.org/state-sponsored-homophobia-report/ [Accessed 2025-01-02]
  29. Most searched things on Google in 2024. SimilarWeb. URL: https://www.similarweb.com/blog/marketing/seo/top-keywords/ [Accessed 2024-12-31]
  30. Carneiro HA, Mylonakis E. Google trends: a web-based tool for real-time surveillance of disease outbreaks. Clin Infect Dis. Nov 15, 2009;49(10):1557-1564. [CrossRef] [Medline]
  31. Havelka EM, Mallen CD, Shepherd TA. Using Google Trends to assess the impact of global public health days on online health information seeking behaviour in Central and South America. J Glob Health. Jun 2020;10(1):010403. [CrossRef] [Medline]
  32. Arora VS, McKee M, Stuckler D. Google Trends: opportunities and limitations in health and health policy research. Health Policy. Mar 2019;123(3):338-341. [CrossRef] [Medline]
  33. Azzam DB, Nag N, Tran J, et al. A novel epidemiological approach to geographically mapping population dry eye disease in the United States through Google Trends. Cornea. Mar 1, 2021;40(3):282-291. [CrossRef] [Medline]
  34. NordVPN survey reveals: users still trust free vpns. NordVPN. URL: https://nordvpn.com/blog/nordvpn-usage-survey/ [Accessed 2025-01-02]


GAM: Global AIDS Monitoring system
GT: Google Trend
GTPSE: Google Trend Population Size Estimate
KP: key population
LMIC: low- and middle-income country
MSM: men who have sex with men
PSE: Population size estimation
PSE: population size estimation
RSV: relative search volume
SSA: sub-Saharan Africa
UNAIDS: United Nations Programme on HIV/AIDS
VPN: virtual private network
WHO: World Health Organization


Edited by Amaryllis Mavragani; submitted 20.03.24; peer-reviewed by Elysee Tuyishime, Kiffer Card; final revised version received 28.05.24; accepted 28.05.24; published 09.01.25.

Copyright

© Carly M Malburg, Steve Gutreuter, Horacio Ruiseñor-Escudero, Abu Abdul-Quader, Wolfgang Hladik. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 9.1.2025.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included.