Using the Novel Mortality-Prevalence Ratio to Evaluate Potentially Undocumented SARS-CoV-2 Infection: Correlational Study

Background: The high prevalence of COVID-19 has resulted in 200,000 deaths as of early 2020. The corresponding mortality rate among different countries and times varies. Objective: This study aims to investigate the relationship between the mortality rate and prevalence of COVID-19 within a country. Methods: We collected data from the Johns Hopkins Coronavirus Resource Center. These data included the daily cumulative death count, recovered count, and confirmed count for each country. This study focused on a total of 36 countries with over 10,000 confirmed COVID-19 cases. Mortality was the main outcome and dependent variable, and it was computed by dividing the number of COVID-19 deaths by the number of confirmed cases. Results: The results of our global panel regression analysis showed that there was a highly significant correlation between prevalence and mortality (ρ=0.8304; P<.001). We found that every increment of 1 confirmed COVID-19 case per 1000 individuals led to a 1.29268% increase in mortality, after controlling for country-specific baseline mortality and time-fixed effects. Over 70% of excess mortality could be attributed to prevalence, and the heterogeneity among countries’ mortality-prevalence ratio was significant (P<.001). Further, our results showed that China had an abnormally high and significant mortality-prevalence ratio compared to other countries (P<.001). This unusual deviation in the mortality-prevalence ratio disappeared with the removal of the data that was collected from China after February 17, 2020. It is worth noting that the prevalence of a disease relies on accurate diagnoses and comprehensive surveillance, which can be difficult to achieve due to practical or political concerns. Conclusions: The association between COVID-19 mortality and prevalence was observed and quantified as the mortality-prevalence ratio. Our results highlight the importance of constraining disease transmission to decrease mortality rates. The comparison of mortality-prevalence ratios between countries can be a powerful method for detecting, or even quantifying, the proportion of individuals with undocumented SARS-CoV-2 infection. (JMIR Public Health Surveill 2021;7(1):e23034) doi: 10.2196/23034


Introduction
The first cluster of cases of pneumonia, which was later identified as COVID-19, a disease caused by the SARS-CoV-2 virus [1], was reported in Wuhan, China on December 31, 2019 [2]. The disease outbreak in China eventually developed into a pandemic, which forced widespread changes throughout the world and added substantial disease and economic burden worldwide. As of May 2, 2020, more than 36 countries have reported at least 10,000 cases of COVID-19. A total of around 4 million cases and 274,000 deaths have been reported [2,3]. Numerous studies have been conducted to investigate the biological and epidemiologic characteristics of COVID-19 [4][5][6].
Most results have been derived from traditional epidemiological models, wherein both COVID-19 mortality (ie, the "case fatality rate" in some literature) and recovery rates were assumed to be constants. However, in a study conducted by Bialek et al [7], heterogeneity in mortality rates was found among countries and cities, but this has been attributed to the assumed underlying medical conditions within an area [8][9][10]. The trend in mortality over time is also controversial [11][12][13]. Although results from an exponential growth model have shown an overall exponential decay in mortality within China since the disease outbreak [13], there has been evidence that shows disease prevalence influences disease mortality to a considerable extent. The rapid increase in the number of infections may result in the collapse of the health care system, leading to a sharp rise of mortality [11,12]. Despite the inconsistencies in mortality characteristics between studies, previous analyses have been performed with data that were collected before March, 2020. Up until then, only a few countries reported the number of COVID-19 deaths, whereas most areas were not majorly affected by COVID-19.
This study aims to sophisticatedly quantify the relationship between COVID-19 prevalence and mortality, by using data that have been updated up until May 2, 2020. A linear relationship between prevalence and mortality was observed, and this was referred to as the mortality-prevalence ratio. The global mortality-prevalence ratio was estimated after adjusting for country-specific baseline mortality and time-fixed effects. Country-specific mortality-prevalence ratio values can be used as a powerful index for identifying countries with a substantial number undocumented infections or overburdened health care systems.

Methods
COVID-19-related data [14] was downloaded from the Johns Hopkins Coronavirus Resource Center. These data included the cumulative number of confirmed cases (C it ), death cases (D it ), and recovered cases (R it ) of the i th country from January 22 to May 2, 2020. We then matched each country with their respective national population data, which were provided by World Population Review [15]. Countries without a matched population were excluded from this study. After exclusion, 174 countries remained in our dataset. We later aggregated the remaining countries to obtain the corresponding global counts.
For each country and each time point, we computed the following 3 metrics, along with the global data: (1) the number of cases still in treatment (CT it ), which represents the total number of COVID-19 cases that involved medical assistance at time t; (2) the prevalence of COVID-19 in country i at time t (P it ); and (3) COVID-19 mortality in country i at time t (M it ). For the sake of model stability, the analyses were only performed on countries with a C it of ≥10,000. The following equations were used to calculate each metric: To investigate the association between mortality and prevalence after adjusting for the baseline mortality in each country and the effect of regular fluctuation over time, we built the following panel regression model (ie, Model 1): M it = β country + β t + γP it + ε it ..... (4) In this model, β country represents the country-specific baseline mortality; β t is the time-fixed effect on the mortality; γ represents the global association between P it and M it , which we referred to as the global mortality-prevalence ratio; and ε it is the residual. To meet the assumption that the mortality-prevalence ratio varies in each country, we built a panel regression model (ie, Model 2), in which the global mortality-prevalence ratio was replaced with the country-specific mortality-prevalence ratio, γ country . Model 2 is described as follows: In this model, γ country is the country-specific association between P it and M it , which we referred to as the country-specific mortality-prevalence ratio. Furthermore, we tested whether γ country differed between each country with an analysis of variance test. We also tested whether the difference could be treated as the random effect of a normal population with the Shapiro-Wilk normality test. All analyses were conducted with R version 3.5.2. The approval of an institutional review board was not required because no individual-level/personal data were used. Table 1 shows the population and the total number of confirmed cases, death cases, and recovered cases for countries that reported at least 10,000 confirmed cases by May 2, 2020. Figure  1 shows the association between COVID-19 prevalence and mortality among these countries. The Spearman correlation coefficient was 0.8304 (P<.001) and the Pearson correlation coefficient was 0.3385 (P=.04). These values indicated a significant positive correlation between prevalence and mortality. COVID-19 mortality and prevalence were relatively high in the United Kingdom and Belgium, while the United States had a high prevalence and a relatively low mortality compared to countries with similar prevalence levels, such as China and Canada.

Results
It is worth mentioning that the positive correlation between mortality and prevalence is not restricted to COVID-19. For example, when considering the prevalence and mortality of severe acute respiratory syndrome (SARS) on July 31, 2003 based on data from the World Health Organization, the Spearman correlation coefficient was 0.3915 (P=.03). Since the number of countries involved with the COVID-19 pandemic is considerably larger than those involved with the SARS pandemic, the correlation between COVID-19 mortality and prevalence is statistically more significant than the correlation between SARS mortality and prevalence.
The relationship between global COVID-19 prevalence and mortality can also be observed when time is considered ( Figure  2). Both prevalence and mortality increased over time.   In order to sophisticatedly estimate the relationship between mortality and prevalence, time and country-specific baseline mortalities in Model 1 were adjusted. The estimations for all coefficients are shown in Table 2. The global mortality-prevalence ratio, which was represented by γ in Model 1, was estimated to be 12.9268 (P<.001). This number can be interpreted as follows: an increment of 1 COVID-19 case per 1000 people is coupled with a 1.29268% (ie, 12.9268 × 1/1000 × 100) increase in mortality. The R 2 value that was calculated from Model 1 was 98.11%, and the partial R 2 value for was 70.41%. These values indicated that COVID-19 prevalence could roughly explain the 70% heterogeneity in excess mortality after controlling for country-specific baseline mortality and time-fixed effects. The analysis of variance test showed potential heterogeneity in the mortality-prevalence ratios among different countries (P<.001). Therefore, we performed a panel regression analysis based on Model 2, as shown in Table 2. It should be noted that the partial R 2 value for the mortality-prevalence ratio increased to 89.37% in Model 2. We obtained estimated country-specific mortality-prevalence ratios that ranged from −1205 to 348 from the 36 countries that were included in our analysis (Figure 3). Absolute mortality-prevalence ratio values of >100 were found in 5 countries (ie, Indonesia, India, Poland, Japan, and China), of which China was the only country that had a significantly different mortality-prevalence ratio (348; P<.001). The results of our Shapiro-Wilk normality test meant that we could reject the hypothesis that all significant country-specific mortality-prevalence ratios came from a normal distribution (P<.001). As we further investigated the pattern of China's mortality-prevalence ratio over time, we noted that the correlation had turned from positive to negative after February 17, 2020 (Figure 4). This disparity was not observed if the data that was collected after February 17, 2020 was excluded ( Figure  3) (Shapiro-Wilk normality test: P=.78).

Discussion
This is the first study to assess the correlation between COVID-19 prevalence and mortality after adjusting for time-fixed effects and country-specific baseline mortality. We proposed the mortality-prevalence ratio as a novel characteristic for an infectious disease pandemic because of the high association between disease mortality and prevalence. In addition, a disparity in the mortality-prevalence ratios of 5 countries was observed; China was the only country with a significant mortality-prevalence ratio (348; P<.001). The disparity of China's mortality-prevalence ratio was due to the data reported after February 17, 2020. Although the mortality was proportional to the prevalence, the mortality-prevalence ratio was relatively robust to changes in prevalence (Figure 3). A high peak in mortality-prevalence ratios could be explained by a high proportion of undocumented infections within a country, which might be attributed to the limited number of diagnostic kits or changes in surveillance policies. An alternative explanation for the sudden rise of mortality could be that the health care system in China was relatively weak after February 17. However, this argument contradicts the fact that China's overall baseline country-specific mortality was typically followed by a steady increase in disease prevalence after February 17. The evolution of the pathogenicity and transmissibility of SARS-CoV-2 within China during this period could be another alternative reason for the disparity in mortality-prevalence ratios. Further studies are required to determine the underlying cause of this sharp increase in the mortality-prevalence ratio.
This study revealed the importance of public policies that aim to prevent disease transmission. These policies include social distancing, restricting travel, encouraging the wearing of facial masks and hand washing, and cancelling large events. Although the mortality rate of a certain infectious disease is traditionally assumed to be a constant in an infectious dynamic model [16], it is conceivable that a highly infectious disease affects the quality and availability of a health care system. The fast consumption of ventilation machines and the decline of nurse-to-patient ratios accelerate mortality. Prevention policies not only lower the financial burden on COVID-19 diagnosis and treatment, but also reduce COVID-19 mortality. Therefore, when future cost-effectiveness analyses are performed with respect to the balance between economic recovery and public health, it is crucial to consider the positive association between disease prevalence and mortality and the costs that come with it.
There are several limitations in this study. First, all results were based on ecological and panel data. Such data lack individual-level information. Therefore, the ecological fallacy would occur when trying to infer causality at the individual level [17]. The temporal effects of prevalence on mortality should also be confirmed to verify country-level causality. Second, although the prevalence of COVID-19 can generally be interpreted as an acute burden of health care, this relationship can be better verified when data on the actual insufficiencies of health care systems are available. Third, disease prevalence relies on accurate diagnoses and comprehensive surveillance, which can be difficult to achieve due to practical or political concerns. This was especially true at the beginning of the COVID-19 pandemic, which was when tests for COVID-19 were not accurate and data on people who died from COVID-19 may not have been captured. In this study, although countries with undocumented infections can be partially inferred with disparities in mortality-prevalence ratios, a more direct index merits further study.
In conclusion, we observed the relationship between COVID-19 mortality and prevalence and quantified this relationship as mortality-prevalence ratios. Our results highlight the benefit of constraining disease transmission to reduce mortality. Disparities in mortality-prevalence ratios can also be a powerful tool to detect, or even quantify, the proportion of undocumented infections.