Published on in Vol 7, No 1 (2021): January

Preprints (earlier versions) of this paper are available at, first published .
Using the Novel Mortality-Prevalence Ratio to Evaluate Potentially Undocumented SARS-CoV-2 Infection: Correlational Study

Using the Novel Mortality-Prevalence Ratio to Evaluate Potentially Undocumented SARS-CoV-2 Infection: Correlational Study

Using the Novel Mortality-Prevalence Ratio to Evaluate Potentially Undocumented SARS-CoV-2 Infection: Correlational Study

Original Paper

Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan

Corresponding Author:

Chu-Lan Michael Kao, PhD

Institute of Statistics

National Chiao Tung University

Assembly Building I, 4th Floor

1001 University Road

Hsinchu, 30010


Phone: 886 35712121 ext 56822


Background: The high prevalence of COVID-19 has resulted in 200,000 deaths as of early 2020. The corresponding mortality rate among different countries and times varies.

Objective: This study aims to investigate the relationship between the mortality rate and prevalence of COVID-19 within a country.

Methods: We collected data from the Johns Hopkins Coronavirus Resource Center. These data included the daily cumulative death count, recovered count, and confirmed count for each country. This study focused on a total of 36 countries with over 10,000 confirmed COVID-19 cases. Mortality was the main outcome and dependent variable, and it was computed by dividing the number of COVID-19 deaths by the number of confirmed cases.

Results: The results of our global panel regression analysis showed that there was a highly significant correlation between prevalence and mortality (ρ=0.8304; P<.001). We found that every increment of 1 confirmed COVID-19 case per 1000 individuals led to a 1.29268% increase in mortality, after controlling for country-specific baseline mortality and time-fixed effects. Over 70% of excess mortality could be attributed to prevalence, and the heterogeneity among countries’ mortality-prevalence ratio was significant (P<.001). Further, our results showed that China had an abnormally high and significant mortality-prevalence ratio compared to other countries (P<.001). This unusual deviation in the mortality-prevalence ratio disappeared with the removal of the data that was collected from China after February 17, 2020. It is worth noting that the prevalence of a disease relies on accurate diagnoses and comprehensive surveillance, which can be difficult to achieve due to practical or political concerns.

Conclusions: The association between COVID-19 mortality and prevalence was observed and quantified as the mortality-prevalence ratio. Our results highlight the importance of constraining disease transmission to decrease mortality rates. The comparison of mortality-prevalence ratios between countries can be a powerful method for detecting, or even quantifying, the proportion of individuals with undocumented SARS-CoV-2 infection.

JMIR Public Health Surveill 2021;7(1):e23034



The first cluster of cases of pneumonia, which was later identified as COVID-19, a disease caused by the SARS-CoV-2 virus [1], was reported in Wuhan, China on December 31, 2019 [2]. The disease outbreak in China eventually developed into a pandemic, which forced widespread changes throughout the world and added substantial disease and economic burden worldwide. As of May 2, 2020, more than 36 countries have reported at least 10,000 cases of COVID-19. A total of around 4 million cases and 274,000 deaths have been reported [2,3]. Numerous studies have been conducted to investigate the biological and epidemiologic characteristics of COVID-19 [4-6]. Most results have been derived from traditional epidemiological models, wherein both COVID-19 mortality (ie, the “case fatality rate” in some literature) and recovery rates were assumed to be constants. However, in a study conducted by Bialek et al [7], heterogeneity in mortality rates was found among countries and cities, but this has been attributed to the assumed underlying medical conditions within an area [8-10]. The trend in mortality over time is also controversial [11-13]. Although results from an exponential growth model have shown an overall exponential decay in mortality within China since the disease outbreak [13], there has been evidence that shows disease prevalence influences disease mortality to a considerable extent. The rapid increase in the number of infections may result in the collapse of the health care system, leading to a sharp rise of mortality [11,12]. Despite the inconsistencies in mortality characteristics between studies, previous analyses have been performed with data that were collected before March, 2020. Up until then, only a few countries reported the number of COVID-19 deaths, whereas most areas were not majorly affected by COVID-19.

This study aims to sophisticatedly quantify the relationship between COVID-19 prevalence and mortality, by using data that have been updated up until May 2, 2020. A linear relationship between prevalence and mortality was observed, and this was referred to as the mortality-prevalence ratio. The global mortality-prevalence ratio was estimated after adjusting for country-specific baseline mortality and time-fixed effects. Country-specific mortality-prevalence ratio values can be used as a powerful index for identifying countries with a substantial number undocumented infections or overburdened health care systems.

COVID-19–related data [14] was downloaded from the Johns Hopkins Coronavirus Resource Center. These data included the cumulative number of confirmed cases (Cit), death cases (Dit), and recovered cases (Rit) of the ith country from January 22 to May 2, 2020. We then matched each country with their respective national population data, which were provided by World Population Review [15]. Countries without a matched population were excluded from this study. After exclusion, 174 countries remained in our dataset. We later aggregated the remaining countries to obtain the corresponding global counts.

For each country and each time point, we computed the following 3 metrics, along with the global data: (1) the number of cases still in treatment (CTit), which represents the total number of COVID-19 cases that involved medical assistance at time t; (2) the prevalence of COVID-19 in country i at time t (Pit); and (3) COVID-19 mortality in country i at time t (Mit). For the sake of model stability, the analyses were only performed on countries with a Cit of ≥10,000. The following equations were used to calculate each metric:

CTit = Cit – Dit − Rit .....(1)
Pit = Cit/total population of country i .....(2)
Mit = Dit/Cit .....(3)

To investigate the association between mortality and prevalence after adjusting for the baseline mortality in each country and the effect of regular fluctuation over time, we built the following panel regression model (ie, Model 1):

Mit = βcountry + βt + γPit + εit .....(4)

In this model, βcountry represents the country-specific baseline mortality; βt is the time-fixed effect on the mortality; γ represents the global association between Pit and Mit, which we referred to as the global mortality-prevalence ratio; and εit is the residual. To meet the assumption that the mortality-prevalence ratio varies in each country, we built a panel regression model (ie, Model 2), in which the global mortality-prevalence ratio was replaced with the country-specific mortality-prevalence ratio, γcountry. Model 2 is described as follows:

Mit = βcountry + βt + γcountryPit + εit .....(5)

In this model, γcountry is the country-specific association between Pit and Mit, which we referred to as the country-specific mortality-prevalence ratio. Furthermore, we tested whether γcountry differed between each country with an analysis of variance test. We also tested whether the difference could be treated as the random effect of a normal population with the Shapiro-Wilk normality test. All analyses were conducted with R version 3.5.2. The approval of an institutional review board was not required because no individual-level/personal data were used.

Table 1 shows the population and the total number of confirmed cases, death cases, and recovered cases for countries that reported at least 10,000 confirmed cases by May 2, 2020. Figure 1 shows the association between COVID-19 prevalence and mortality among these countries. The Spearman correlation coefficient was 0.8304 (P<.001) and the Pearson correlation coefficient was 0.3385 (P=.04). These values indicated a significant positive correlation between prevalence and mortality. COVID-19 mortality and prevalence were relatively high in the United Kingdom and Belgium, while the United States had a high prevalence and a relatively low mortality compared to countries with similar prevalence levels, such as China and Canada.

It is worth mentioning that the positive correlation between mortality and prevalence is not restricted to COVID-19. For example, when considering the prevalence and mortality of severe acute respiratory syndrome (SARS) on July 31, 2003 based on data from the World Health Organization, the Spearman correlation coefficient was 0.3915 (P=.03). Since the number of countries involved with the COVID-19 pandemic is considerably larger than those involved with the SARS pandemic, the correlation between COVID-19 mortality and prevalence is statistically more significant than the correlation between SARS mortality and prevalence.

The relationship between global COVID-19 prevalence and mortality can also be observed when time is considered (Figure 2). Both prevalence and mortality increased over time.

Table 1. Total population and the total number of confirmed cases, death cases, and recovered cases for countries that reported at least 10,000 confirmed cases by May 2, 2020.
CountryTotal population, NConfirmed cases, nDeaths, nRecovered cases, n
Saudi Arabia34,813,87125,4591763765
United Arab Emirates9,890,40213,5991192664
United Kingdom67,886,011183,50028,205896
United States331,002,6511,132,53966,369175,382
Figure 1. COVID-19 mortality and prevalence of all countries (ρ=0.8304; P<.001). Only the top 20 countries with the highest prevalence are shown.
View this figure
Figure 2. Trends of global COVID-19 mortality and prevalence over time.
View this figure

In order to sophisticatedly estimate the relationship between mortality and prevalence, time and country-specific baseline mortalities in Model 1 were adjusted. The estimations for all coefficients are shown in Table 2. The global mortality-prevalence ratio, which was represented by γ in Model 1, was estimated to be 12.9268 (P<.001). This number can be interpreted as follows: an increment of 1 COVID-19 case per 1000 people is coupled with a 1.29268% (ie, 12.9268 × 1/1000 × 100) increase in mortality. The R2 value that was calculated from Model 1 was 98.11%, and the partial R2 value for was 70.41%. These values indicated that COVID-19 prevalence could roughly explain the 70% heterogeneity in excess mortality after controlling for country-specific baseline mortality and time-fixed effects. The analysis of variance test showed potential heterogeneity in the mortality-prevalence ratios among different countries (P<.001). Therefore, we performed a panel regression analysis based on Model 2, as shown in Table 2. It should be noted that the partial R2 value for the mortality-prevalence ratio increased to 89.37% in Model 2.

Table 2. Estimation of all coefficients for Model 1 and Model 2.
ModelEstimationP valuePartial R2
Model 1a

Mortality-prevalence ratio (ie, γ)12.9268<.0010.7041
Model 2b

Country-specific mortality-prevalence ratio (ie, γcountry)0.8937

All data



























Saudi Arabia–26.3058.09







United Arab Emirates–14.6387.60

United Kingdom14.4444<.001

United States–4.1818<.001

All data excluding those collected from China after February 17, 2020



























Saudi Arabia–12.5797.42







United Arab Emirates–1.1999.97

United Kingdom27.0183<.001

United States7.5391<.001

aThe R2 value for Model 1 was 0.9811 (P<.001).

bThe R2 value for Model 2 was 0.9931 (P<.001).

We obtained estimated country-specific mortality-prevalence ratios that ranged from −1205 to 348 from the 36 countries that were included in our analysis (Figure 3). Absolute mortality-prevalence ratio values of >100 were found in 5 countries (ie, Indonesia, India, Poland, Japan, and China), of which China was the only country that had a significantly different mortality-prevalence ratio (348; P<.001). The results of our Shapiro-Wilk normality test meant that we could reject the hypothesis that all significant country-specific mortality-prevalence ratios came from a normal distribution (P<.001). As we further investigated the pattern of China’s mortality-prevalence ratio over time, we noted that the correlation had turned from positive to negative after February 17, 2020 (Figure 4). This disparity was not observed if the data that was collected after February 17, 2020 was excluded (Figure 3) (Shapiro-Wilk normality test: P=.78).

Figure 3. Countries with significant country-specific mortality-prevalence ratios based on (A) all data and (B) all data excluding those collected from China after February 17, 2020.
View this figure
Figure 4. COVID-19 prevalence and mortality reported by China over time.
View this figure

This is the first study to assess the correlation between COVID-19 prevalence and mortality after adjusting for time-fixed effects and country-specific baseline mortality. We proposed the mortality-prevalence ratio as a novel characteristic for an infectious disease pandemic because of the high association between disease mortality and prevalence. In addition, a disparity in the mortality-prevalence ratios of 5 countries was observed; China was the only country with a significant mortality-prevalence ratio (348; P<.001). The disparity of China’s mortality-prevalence ratio was due to the data reported after February 17, 2020. Although the mortality was proportional to the prevalence, the mortality-prevalence ratio was relatively robust to changes in prevalence (Figure 3). A high peak in mortality-prevalence ratios could be explained by a high proportion of undocumented infections within a country, which might be attributed to the limited number of diagnostic kits or changes in surveillance policies. An alternative explanation for the sudden rise of mortality could be that the health care system in China was relatively weak after February 17. However, this argument contradicts the fact that China’s overall baseline country-specific mortality was typically followed by a steady increase in disease prevalence after February 17. The evolution of the pathogenicity and transmissibility of SARS-CoV-2 within China during this period could be another alternative reason for the disparity in mortality-prevalence ratios. Further studies are required to determine the underlying cause of this sharp increase in the mortality-prevalence ratio.

This study revealed the importance of public policies that aim to prevent disease transmission. These policies include social distancing, restricting travel, encouraging the wearing of facial masks and hand washing, and cancelling large events. Although the mortality rate of a certain infectious disease is traditionally assumed to be a constant in an infectious dynamic model [16], it is conceivable that a highly infectious disease affects the quality and availability of a health care system. The fast consumption of ventilation machines and the decline of nurse-to-patient ratios accelerate mortality. Prevention policies not only lower the financial burden on COVID-19 diagnosis and treatment, but also reduce COVID-19 mortality. Therefore, when future cost-effectiveness analyses are performed with respect to the balance between economic recovery and public health, it is crucial to consider the positive association between disease prevalence and mortality and the costs that come with it.

There are several limitations in this study. First, all results were based on ecological and panel data. Such data lack individual-level information. Therefore, the ecological fallacy would occur when trying to infer causality at the individual level [17]. The temporal effects of prevalence on mortality should also be confirmed to verify country-level causality. Second, although the prevalence of COVID-19 can generally be interpreted as an acute burden of health care, this relationship can be better verified when data on the actual insufficiencies of health care systems are available. Third, disease prevalence relies on accurate diagnoses and comprehensive surveillance, which can be difficult to achieve due to practical or political concerns. This was especially true at the beginning of the COVID-19 pandemic, which was when tests for COVID-19 were not accurate and data on people who died from COVID-19 may not have been captured. In this study, although countries with undocumented infections can be partially inferred with disparities in mortality-prevalence ratios, a more direct index merits further study.

In conclusion, we observed the relationship between COVID-19 mortality and prevalence and quantified this relationship as mortality-prevalence ratios. Our results highlight the benefit of constraining disease transmission to reduce mortality. Disparities in mortality-prevalence ratios can also be a powerful tool to detect, or even quantify, the proportion of undocumented infections.


We thank Ms Kai-Fen Wong for editing the figures. This study was supported by the Ministry Of Science and Technology, Taiwan (grant numbers 107-2118-M-009-003-MY2 and 108-2636-B-009-001).

Authors' Contributions

SHL and CLMK came up with the original idea. SHL and CLMK wrote the first version of manuscript. SCF edited the manuscript. CLMK built the model, wrote all the software, and conducted the data analysis.

Conflicts of Interest

None declared.


  1. Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, China Novel Coronavirus Investigating and Research Team. A Novel Coronavirus from Patients with Pneumonia in China, 2019. N Engl J Med 2020 Feb 20;382(8):727-733 [FREE Full text] [CrossRef] [Medline]
  2. Coronavirusdisease (COVID-19) Situation Report–105. World Health Organization.   URL: https:/​/www.​​docs/​default-source/​coronaviruse/​situation-reports/​20200504-covid-19-sitrep-105.​pdf?sfvrsn=4cdda8af_2 [accessed 2021-01-05]
  3. Johns Hopkins Coronavirus Resource Center. Johns Hopkins University & Medicine.   URL: [accessed 2021-01-05]
  4. Li R, Pei S, Chen B, Song Y, Zhang T, Yang W, et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2). Science 2020 May 01;368(6490):489-493 [FREE Full text] [CrossRef] [Medline]
  5. Weitz JS, Beckett SJ, Coenen AR, Demory D, Dominguez-Mirazo M, Dushoff J, et al. Modeling shield immunity to reduce COVID-19 epidemic spread. Nat Med 2020 Jun;26(6):849-854. [CrossRef] [Medline]
  6. Prem K, Liu Y, Russell TW, Kucharski AJ, Eggo RM, Davies N, Centre for the Mathematical Modelling of Infectious Diseases COVID-19 Working Group, et al. The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study. Lancet Public Health 2020 May;5(5):e261-e270 [FREE Full text] [CrossRef] [Medline]
  7. CDC COVID-19 Response Team. Geographic Differences in COVID-19 Cases, Deaths, and Incidence - United States, February 12-April 7, 2020. MMWR Morb Mortal Wkly Rep 2020 Apr 17;69(15):465-471 [FREE Full text] [CrossRef] [Medline]
  8. Chen N, Zhou M, Dong X, Qu J, Gong F, Han Y, et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet 2020 Feb 15;395(10223):507-513 [FREE Full text] [CrossRef] [Medline]
  9. Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet 2020 Feb 15;395(10223):497-506 [FREE Full text] [CrossRef] [Medline]
  10. Lescure FX, Bouadma L, Nguyen D, Parisey M, Wicky PH, Behillil S, et al. Clinical and virological data of the first cases of COVID-19 in Europe: a case series. Lancet Infect Dis 2020 Jun;20(6):697-706 [FREE Full text] [CrossRef] [Medline]
  11. Lai CC, Wang CY, Wang YH, Hsueh SC, Ko WC, Hsueh PR. Global epidemiology of coronavirus disease 2019 (COVID-19): disease incidence, daily cumulative index, mortality, and their association with country healthcare resources and economic status. Int J Antimicrob Agents 2020 Apr;55(4):105946 [FREE Full text] [CrossRef] [Medline]
  12. Ji Y, Ma Z, Peppelenbosch MP, Pan Q. Potential association between COVID-19 mortality and health-care resource availability. Lancet Glob Health 2020 Apr;8(4):e480 [FREE Full text] [CrossRef] [Medline]
  13. Zhang Z, Yao W, Wang Y, Long C, Fu X. Wuhan and Hubei COVID-19 mortality analysis reveals the critical role of timely supply of medical resources. J Infect 2020 Jul;81(1):147-178 [FREE Full text] [CrossRef] [Medline]
  14. COVID-19/time_series_covid19_confirmed_global.csv. GitHub.   URL: https:/​/github.​com/​CSSEGISandData/​COVID-19/​blob/​master/​csse_covid_19_data/​csse_covid_19_time_series/​time_series_covid19_confirmed_global.​csv [accessed 2021-01-06]
  15. Countries by Density 2020. World Population Review.   URL: [accessed 2021-01-06]
  16. Wang XS, Wu J, Yang Y. Richards model revisited: validation by and application to infection dynamics. J Theor Biol 2012 Nov 21;313:12-19. [CrossRef] [Medline]
  17. Rothman KJ. Modern epidemiology. Boston, MA: Little Brown & Co; 1986.

SARS: severe acute respiratory syndrome

Edited by G Eysenbach; submitted 31.07.20; peer-reviewed by LA Lee, W Zhang; comments to author 20.11.20; revised version received 26.11.20; accepted 14.12.20; published 27.01.21


©Sheng-Hsuan Lin, Shih-Chen Fu, Chu-Lan Michael Kao. Originally published in JMIR Public Health and Surveillance (, 27.01.2021.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.