Original Paper
Abstract
Background: Despite the available evidence on its severity, COVID-19 has often been compared with seasonal flu by some conspirators and even scientists. Various public discussions arose about the noncausal correlation between COVID-19 and the observed deaths during the pandemic period in Italy.
Objective: This paper aimed to search for endogenous reasons for the mortality increase recorded in Italy during 2020 to test this controversial hypothesis. Furthermore, we provide a framework for epidemiological analyses of time series.
Methods: We analyzed deaths by age, sex, region, and cause of death in Italy from 2011 to 2019. Ordinary least squares (OLS) linear regression analyses and autoregressive integrated moving average (ARIMA) were used to predict the best value for 2020. A Grubbs 1-sided test was used to assess the significance of the difference between predicted and observed 2020 deaths/mortality. Finally, a 1-sample t test was used to compare the population of regional excess deaths to a null mean. The relationship between mortality and predictive variables was assessed using OLS multiple regression models. Since there is no uniform opinion on multicomparison adjustment and false negatives imply great epidemiological risk, the less-conservative Siegel approach and more-conservative Holm-Bonferroni approach were employed. By doing so, we provided the reader with the means to carry out an independent analysis.
Results: Both ARIMA and OLS linear regression models predicted the number of deaths in Italy during 2020 to be between 640,000 and 660,000 (range of 95% CIs: 620,000-695,000) against the observed value of above 750,000. We found strong evidence supporting that the death increase in all regions (average excess=12.2%) was not due to chance (t21=7.2; adjusted P<.001). Male and female national mortality excesses were 18.4% (P<.001; adjusted P=.006) and 14.1% (P=.005; adjusted P=.12), respectively. However, we found limited significance when comparing male and female mortality residuals’ using the Mann-Whitney U test (P=.27; adjusted P=.99). Finally, mortality was strongly and positively correlated with latitude (R=0.82; adjusted P<.001). In this regard, the significance of the mortality increases during 2020 varied greatly from region to region. Lombardy recorded the highest mortality increase (38% for men, adjusted P<.001; 31% for women, P<.001; adjusted P=.006).
Conclusions: Our findings support the absence of historical endogenous reasons capable of justifying the mortality increase observed in Italy during 2020. Together with the current knowledge on SARS-CoV-2, these results provide decisive evidence on the devastating impact of COVID-19. We suggest that this research be leveraged by government, health, and information authorities to furnish proof against conspiracy hypotheses that minimize COVID-19–related risks. Finally, given the marked concordance between ARIMA and OLS regression, we suggest that these models be exploited for public health surveillance. Specifically, meaningful information can be deduced by comparing predicted and observed epidemiological trends.
doi:10.2196/36022
Keywords
Introduction
Background
SARS-CoV-2 is a new beta coronavirus first identified in December 2019 in Wuhan, China. The related pathology, called COVID-19, has raged worldwide, claiming millions of victims and throwing economic and health systems into severe crises. In such a dramatic scenario, Europe is one of the most affected areas: As of December 2021, it accounted for over 30% of global official deaths (ie, approximately 1,600,000) [
]. Because the risk factors are multiple, including environmental conditions, pollution, age, gender, ethnicity, crowding, poverty, and medical comorbidities, mortality varies substantially from country to country as well as intranationally [ - ]. Indeed, the peaks in daily deaths per million inhabitants ranged from 1 (Ukraine) to over 40 (Belgium), with a median of 3.5 (IQR 2-13) [ ]. The first European nation to suffer the devastating effects of COVID-19 was Italy, with mortality peaks much higher than the European median (over 15). In particular, the regions of northern Italy—especially the provinces of Bergamo and Brescia—faced a harsh first wave, reaching the highest number of deaths globally [ , ]. To date, despite a substantial reduction in mortality thanks to a massive vaccination campaign, Italy is still the second-ranking European country for official COVID-19 deaths [ , ]. Nonetheless, the debate over COVID-19 mortality has been intense during the pandemic. In the early stages, given the low testing capabilities, the calculation of mortality was subject to numerous uncertainties, which led to both overestimates and underestimates. For this reason, researchers focused their efforts on comparing 2020 data with historical death series [ ].Ordinary least squares (OLS) regression models are among the most adopted model by scientists due to their simplicity and efficacy. Specifically, OLS multiple and simple regressions have often been used to predict the course of COVID-19 cases and deaths, both individually and in conjunction with other epidemiological models such as Susceptible-Infected-Recovered (SIR) [
- ]. The literature shows that linear regression analyses are valuable short-term forecasting tools when the necessary assumptions are satisfied. However, it is not unusual for requirements such as normality of the residuals or homoskedasticity to be violated when dealing with actual epidemiological data. In these cases, the use of corrective procedures or alternative models should be considered. Among the latter, autoregressive integrated moving average (ARIMA) and SARIMA (ARIMA + seasonal component) models have shown excellent predictive capabilities. In particular, a recent study by Abolmaali and Shirzaei [ ] demonstrated that the ARIMA approach could outperform other classical models such as logistic function, linear regression, and SIR. Similar findings were obtained by Alabdulrazzaq et al [ ], who proved that the accuracy of the prediction of COVID-19 spread provided by their ARIMA model was both appropriate and satisfactory.Despite more than 135,000 official deaths nationwide, some Italian conspiracy movements argue that COVID-19 is a nondangerous disease and that these numbers have been deliberately exaggerated [
]. Unfortunately, it was not uncommon even for eminent Italian scientists or other prominent personalities to have recklessly downplayed the risks of COVID-19 or favored the spread of fake news [ , ]. Thus, the infodemic question “Dead from COVID or with COVID?” soon filled social networks [ ]. Indeed, such a question arises from the hypothesis that COVID-19 was noncausally correlated with the deaths recorded in Italy in 2020.Objective
Based on this premise, this study aimed to estimate the difference between the observed and predicted numbers of deaths in Italy during 2020. In particular, we modelled all mortality trends by cause of death, sex, and age group from 2011 to 2019, predicting the best values for 2020. By doing so, causal evidence will be provided on the impact of a nonendogenous mortality factor, such as COVID-19. The results of this paper have epidemiological and infodemiological relevance since (1) 2 models widely adopted by the scientific community such as OLS linear regression and ARIMA are compared; (2) to the best of our knowledge, this is the most detailed historical and forecasting survey regarding mortality in Italy; and (3) an estimate of the statistical significance of the increase in mortality in Italy during 2020 is provided. Finally, we investigate 2 essential, but often overlooked, aspects of epidemiological and public health surveillance, namely the possible emergence of nonlinear subtrends (capable of invalidating predictions of models trained on historical global data) and the problem of multicomparison adjustment (capable of dangerously inflating false negatives).
Methods
Data Collection
For this study, we used data from the national agencies and portals of demographic and statistical research, Italy (details and references are provided in the following paragraphs). Specifically, the annual number of deaths (including deaths by sex and age groups), deaths per causes of deaths (including deaths by sex groups), and mortality (including mortality by sex and age groups) were extracted from the platforms and annual reports of the National Institute of Statistics (ISTAT) and National Health Observatory for the years 2011 to 2020 [
- ]. Demographic data (ie, population number per age group, population number and density, and per region) were gathered from Tuttitalia.it [ , ]. This portal contains all ISTAT demographic information relating to municipalities, provinces, and regions. Although the investigated period ranged from 2011 to 2020, causes of death statistics were available until 2017 as the official evaluation process takes 3 years [ ]. More details on the data collection process are described in .Procedure and Statistical Analysis Key Points
Here, we provide a summary of the procedure adopted. A more detailed description is reported in
. We modeled regional trends in annual deaths and mortality from 2011 to 2019 through OLS linear regression. We called Δ* the residuals’ data set from 2011 to 2019 and Δ the residuals’ data set from 2011 to 2020. Through the Grubbs 1-sided test, we searched for high outliers in Δ* and Δ. The Grubbs test was performed using RStudio v.4.1.2 software (library: outliers). We also performed a 1-sample t test to assess if the regional death increases were due to chance. This was done by comparing the 2020 excess death population to a fixed null mean (ie, the expected residual). Furthermore, we calculated the difference between the model prediction and the observed value. To validate or deny any statistical anomalies in the number of deaths during 2020, we checked all the trends of the following annual statistics within the 2011-2019 time frame: male deaths by age group, female deaths by age group, male mortality by age group, female mortality by age group, deaths by causes of death, male deaths by causes of death, female deaths by causes of death. Specifically, we searched for anomalous nonlinear subtrends capable of distorting the interpretations on the cumulative data (indeed, sum of linear trends is linear). An example of this phenomenon is shown in Figure S1 in . Concerning male and female deaths for age groups, we also calculated the 2020 forecast for each age group through an ARIMA (p, d, q) model using RStudio v.4.1.2 software (libraries: forecast and tseries). To facilitate the reproducibility of the analysis, we have provided all the ARIMA models in . Finally, we used OLS multiple linear regression to verify any correlations with demographic and geographic statistics such as population, population density, and latitude [ ].Concerning Multicomparison Adjustment Problem
The P value adjustment for a multicomparison test originates from the possibility of unintentionally increasing the number of false positives [
]. However, as shown by Greenland [ ], the indiscriminate and unthinking implementation of this method can lead to conclusions that are erroneous, misleading, and, when sensitive topics are touched (eg, public health), even dangerous. Indeed, a scientist is called upon to consider both the consequences and the likelihood of incurring false results [ , ]. For instance, some authors suggest it is advisable to adjust the P values in exploratory investigations since the chances of spurious correlations due to the look-elsewhere effect are high [ ]. Conversely, adjusting P values can be counterproductive when hypotheses are well-targeted and false negatives carry a serious risk (eg, airport metal detector). Nonetheless, Bender and Lange [ ] highlighted that it is challenging to perform a multiple test adjustment in exploratory analyses due to the possible lack of a clear structure in the multiple tests; ergo, they recommend this procedure only for well-targeted hypotheses. Such a scenario spotlights the absence of a clear consensus [ ]. Additional critical issues lie in the fact that the P value is not the probability that the test hypothesis is true nor that chance alone produced the observed association [ ]. Ergo, adopting an (un)adjusted dichotomous threshold is not suitable for assessing the statistical significance of an outcome, as P values should be used—at best—as graded measures of the strength of evidence against the test hypothesis [ , ]. Finally, other authors have raised further concerns about adjusted P values. For example, Brandt [ ] pointed out the medical unreasonableness of evaluating a patient's test results based on how many tests the patient had that day. With this provocation, Brandt [ ] also questioned the scientific community about the possibility of dividing the results into different studies to bypass the problem of multicomparison. In conclusion, Greenland [ ] stressed that proposing a single null hypothesis represents a bias in the analysis, and P values test not only the degree of data compatibility with the null hypothesis but all the test’s assumptions [ ]. Hence, it must be admitted that every statistical interpretation or adjustment is strongly influenced by the authors’ prejudices and uncertainties on the assumptions made [ , , ]. This is also true of the so-called “robust analyses,” whose complexity is further confusing. For these reasons, a scientist cannot do anything else beyond showing how methods and results vary under different conditions [ ].Our Approach
This manuscript aimed to test statistical methods to identify epidemiologically relevant anomalies in a time series and provide near-definitive evidence on COVID-19 impact on mortality in Italy. Based on the evidence summarized in the previous subsection, we concluded that the best option was to give the reader the means to conduct an independent evaluation showing how results changed under different assumptions. Specifically, we used 2 approaches: The first, proposed by Siegel [
], involves the evaluation of the significance of a global test (ie, national population by sex) and then the implementation of other subtests (ie, regional population by sex) without corrections. In particular, we believe this approach is the most suitable for the purpose of this manuscript and denote it with A1. The second approach, denoted with A2, is the more conservative Holm-Bonferroni method with number of hypotheses m=47 [ ].Results
Overall Death Excess During 2020
Compared with the OLS linear regression model prediction (
), the 2020 excess in the observed number of deaths in Italy was substantially larger (excess=89,287; % excess=13.6 [SE 5.3]). The detailed report is presented in Tables S1 and S2 in . We found strong evidence supporting that the death increase in all regions was not due to chance (mean % excess=12.2 [SD 1.7]; t21=7.2; adjusted P<.001). also shows the high statistical confidence between the values predicted by the OLS linear regression and ARIMA (0,2,2) model; this constitutes further proof of the goodness of the linear interpolation.Male Mortality Rate During 2020
For A1, when the male mortality rate is considered, the 2020 excesses were large and highly significant in 13 of 21 regions (all P<.005). Moderate significant increases were observed in the other five regions (.02≤P≤.10). A low significance was obtained only in Molise, Basilicata, and Calabria (all P≥.20). Overall, the excess male mortality in Italy during 2020 was high and markedly significant (P<.001; excess=18.8 per 10,000; % excess=18.4 [SE 5.4]). Moreover, all regions recorded an excess male mortality between 5% (Basilicata) and 38% (Lombardy). Details of each region are provided in
. Further information on the model goodness is provided in Table S3 in . For A2, adjusted P≤.006 was reached nationally and in 6 regions (Piedmont, Lombardy, Trento, Veneto, Liguria, Emilia Romagna), while .02≤adjusted P≤.08 were reached in 5 regions (Bolzano, Friuli Venezia Giulia, Marche, Abruzzo, Apulia). Campania and Sardinia also registered a moderate significance (adjusted P≤.16). Adjusted P≥.43 were obtained in the remaining regions. Details of each region are provided in .Italian region | Predicted value | Predicted value SE | Observed value | Excess, % (SE) | P valuea | Adjusted P valuea |
Italy | 102.1 | 4.6 | 120.9 | 18.4 (5.4) | <.001 | .006 |
Piemonte | 105.7 | 5.2 | 132.3 | 25.1 (6.2) | <.001 | <.001 |
Valle d’Aosta | 114.4 | 11.3 | 136.3 | 19.1 (12.3) | .02 | .40 |
Lombardia | 98.5 | 4.1 | 136.2 | 38.3 (5.8) | <.001 | <.001 |
Bolzano | 93.6 | 4.9 | 110 | 17.5 (6.2) | .001 | .02 |
Trento | 90.7 | 4.7 | 121 | 33.4 (7) | <.001 | <.001 |
Veneto | 97.2 | 3.7 | 114.7 | 18 (4.5) | <.001 | .002 |
Friuli Venezia Giulia | 99.6 | 5.9 | 116.3 | 16.8 (7) | .002 | .06 |
Liguria | 103.4 | 5.6 | 126.5 | 22.3 (6.7) | <.001 | .006 |
Emilia-Romagna | 97.5 | 4.5 | 116.1 | 19.1 (5.6) | <.001 | .006 |
Toscana | 97.2 | 6 | 108.5 | 11.6 (7) | .03 | .43 |
Umbria | 94.5 | 6.3 | 105.4 | 11.5 (7.6) | .04 | .62 |
Marche | 96.3 | 5.4 | 111.1 | 15.3 (6.6) | .003 | .07 |
Lazio | 100.1 | 5.2 | 110.1 | 10 (5.7) | .02 | .42 |
Abruzzo | 101.9 | 4.5 | 114.6 | 12.5 (5) | .002 | .06 |
Molise | 107.3 | 7.2 | 113.8 | 6 (7.2) | .50 | .99 |
Campania | 116.1 | 5.5 | 129.9 | 11.9 (5.4) | .005 | .12 |
Puglia | 100.1 | 5.8 | 115.6 | 15.5 (6.7) | .003 | .08 |
Basilicata | 107.2 | 6.1 | 112.9 | 5.3 (6) | .40 | .99 |
Calabria | 106.5 | 6.1 | 113.9 | 6.9 (6.2) | .20 | .99 |
Sicilia | 112.7 | 7.2 | 122.9 | 9.1 (7) | .10 | .99 |
Sardegna | 101.2 | 5.3 | 113.7 | 12.3 (5.9) | .007 | .16 |
aGrubbs test.
Female Mortality Rate During 2020
For A1, highly significant excess female mortality was found in the northern regions and Sardinia (P≤.01, except Valle d’Aosta, P=.02). Moderately significant excesses were recorded in Tuscany, Marche, Molise, and Apulia (.04≤P≤.07). Scarcely significant differences were recorded in the rest of Italy (all P>.40). Nevertheless, all regions experienced an excess female mortality between 4% (Basilicata) and 31% (Lombardy). Details of each region are provided in
. Further information on the model goodness is provided in Table S4 in . For A2, Lombardy (adjusted P=.006) and Trento (adjusted P=.001) reached the greatest statistical significance. Moderate significance (.01≤adjusted P≤.06) was reached in 5 regions (Piedmont, Bolzano, Friuli Venezia Giulia, Liguria, and Emilia Romagna). Sardinia (adjusted P=.19) and Veneto (adjusted P=.20) also registered a modest significance. Low significance was observed in the remaining regions (all adjusted P≥.38). Details of each region are provided in .Italian region | Predicted value | Predicted value SE | Observed value | Excess, % (SE) | P valuea | Adjusted P valuea |
Italy | 68.3 | 3.9 | 77.9 | 14.1 (6.6) | .005 | .12 |
Piemonte | 70.8 | 4 | 84.1 | 18.8 (6.8) | .001 | .02 |
Valle d’Aosta | 69.9 | 9.8 | 88.9 | 27.1 (19.3) | .02 | .38 |
Lombardia | 64.3 | 3.5 | 84.2 | 30.9 (7.2) | <.001 | .006 |
Bolzano | 60.5 | 3.6 | 73.9 | 22.1 (7.4) | <.001 | .01 |
Trento | 59.4 | 2.8 | 73.4 | 23.6 (5.9) | <.001 | .001 |
Veneto | 64.2 | 3.8 | 72.8 | 13.4 (6.9) | .009 | .20 |
Friuli Venezia Giulia | 64.2 | 2.8 | 72.6 | 13 (5) | .001 | .05 |
Liguria | 67.4 | 4.3 | 79.3 | 17.7 (7.6) | .002 | .06 |
Emilia-Romagna | 66.1 | 3.3 | 75.6 | 14.4 (5.7) | .002 | .05 |
Toscana | 65.4 | 3.7 | 71.2 | 8.9 (6.3) | .07 | .80 |
Umbria | 63.2 | 4 | 67 | 6.1 (6.8) | .43 | .99 |
Marche | 64 | 4.7 | 71.8 | 12.2 (8.5) | .05 | .68 |
Lazio | 67.8 | 4.7 | 71.9 | 6 (7.5) | .56 | .99 |
Abruzzo | 67.6 | 4.6 | 72 | 6.5 (7.4) | .43 | .99 |
Molise | 66.4 | 5.1 | 74.7 | 12.6 (8.9) | .06 | .73 |
Campania | 80.3 | 5.7 | 85.1 | 6 (7.7) | N/Ab | N/A |
Puglia | 68.5 | 4.7 | 76.5 | 11.6 (7.8) | .04 | .65 |
Basilicata | 71.5 | 4.7 | 74.4 | 4.1 (7) | .40 | .99 |
Calabria | 71.7 | 4.4 | 75.1 | 4.8 (6.5) | .79 | .99 |
Sicilia | 78 | 5.9 | 83.4 | 7 (8.3) | .47 | .99 |
Sardegna | 64 | 3.4 | 71.5 | 11.6 (5.9) | .008 | .19 |
aGrubbs test.
bN/A: not available.
Relationship Between Deaths and Geographical-Demographic Statistics
The linear multiregression model among the log-transformed statistics, the regional number of inhabitants (X1), regional population density (X2), regional latitude (X3), and 2020 regional excess deaths (Y), returned the following equation:
Y=f(X3)=k×pow(X3, a),
with k=2.6×10-7, a=9.9, R=0.82, adjusted P<.001.
Retrospective Analysis of Deaths
shows the number of deaths per cause of death from 2012 to 2017 in Italy (2018 and 2019 data were not available, as shown in ). Tumors and diseases of the circulatory system always accounted for over 60% of total deaths (also considering the projections for 2020). The percentages of male (female) deaths for tumors ranged from 55.6% (44.4%) to 56.3% (43.7%), while deaths related to the circulatory system were 43.1% (56.9%) to 43.7% (56.3%). All trends were markedly linear.
Finally,
and show male and female deaths, respectively, by age group from 2011 to 2019. Explicitly calculating each trend for each age and sex group and summing the predictions for 2020, we obtained the best value of 648,733 deaths. All trends were markedly linear (Figures S3 and S4 in ). Summing up all the forecasts of the ARIMA models for each age and sex group, we obtained a total of 637,534 deaths. A similar result was obtained by summing the global trends for men and women (640,508 deaths).Comparison Between Male and Female Mortality
The increases in mortality were 18% (P<.001; adjusted P=.006) for men and 14% (P=.005; adjusted P=.12) for women at the national level and 16% for men compared with 13% for women, on average, at the regional level. However, we found limited significance when comparing the residuals' populations using the Mann-Whitney U test (P=.27; adjusted P=.99).
Discussion
Principal Findings
This paper provides strong evidence in favor of an anomalous mortality event during 2020 in Italy, which was not predictable based on endogenous causes such as deaths and mortality trends between 2011 and 2019. Notably, the number of total deaths observed in 2020 exceeded the linear regression model prediction by more than 89,000 (a value nearly 3 times greater than the prediction standard error) and the ARIMA prediction by more than 86,000. Grubbs and t tests confirmed that this figure was unexpected. At the national level, the increase in mortality was 18% for the male population and 14% for the female population. Nonetheless, the statistical significance of this difference was low. The total excess mortality was positively correlated with latitude, which explained the data set variability much better than demographic statistics like population number and density. All the “deceases due to causes of death” trends from 2012 to 2017 were appreciably linear or stationary; this precludes the existence of anomalous subtrends linked to the causes of death. Moreover, summing up all the 2020 death predictions by age group, we obtained a value ranging from 640,000 to 660,000 deceased, significantly far from the observed one (750,000). In conclusion, these findings confirm the absence of any confounding inner subtrends capable of explaining the excess deaths during 2020 in Italy.
Comparison With Prior Work
To the best of our knowledge, the most comprehensive and detailed study examining excess mortality during 2020 in Italy was the report redacted by the ISTAT and National Institute of Health (ISS) [
]. Their research focuses on comparing the March-December 2015-2019 and 2020 periods, starting from the assumption that COVID-19 is the cause of the discrepancies observed. On the contrary, our analysis has been more impartial since we have not introduced any hypothesis about the reasons that caused this phenomenon. Therefore, our findings provide evidence of statistical and epidemiological significance that had not been considered before. Specifically, excluding internal causes gives further strength to the theories that identify COVID-19 as the principal cause of such a tragic scenario. COVID-19 dangerousness is confirmed at the molecular-genetic level [ - ]. The strong positive correlation we found between excess mortality and latitude is compatible with greater virulence and mortality of COVID-19 in northern Italy depicted by other literature [ , ]. In this regard, an increasing number of mathematical-statistical investigations classify COVID-19 as a seasonal low-temperature infection [ - ], although the effect size of the environmental factors is still debated [ ]. However, it is a fact that low temperatures can have indirect effects on the spread of infections, like the creation of indoor gatherings—with insufficient air circulation—and the weakening of the immune defenses [ , ]. Since average temperatures in northern regions are lower than the rest of the peninsula [ ], this phenomenon could partially explain the Italian epidemiological scenario. A large amount of literature has also identified pollution as a relevant COVID-19 risk factor. For instance, NO2, PM10, and PM2.5 were causally connected with more serious situations, as they can drastically reduce the immune response and compromise respiratory functions [ - ]. This type of pollutant is widespread in the Po Valley [ , ]. Contrary to other literature, our paper did not detect a high significance in the difference between male and female national mortalities [ - ]. Nonetheless, this result is not conclusive and deserves further investigation as such a discrepancy could be more evident by considering the most affected and exposed age groups. Moreover, the COVID-19 course is influenced by numerous comorbidities, such as cancer, chronic kidney diseases, diabetes mellitus, hypertension, chronic obstructive pulmonary diseases, asthma, chronic respiratory diseases, immunocompromised state, HIV infection, heart conditions, overweight and obesity, dementia or other neurological conditions, and mental health conditions [ - ]. The majority of these pathologies are more common in older age groups, which helps explain the greater aggressiveness of the infection in some regions [ , ]. Hence, it is necessary to consider that the prepandemic epidemiological scenario has contributed to enhancing the disease damage in Italy. Nevertheless, it would be incorrect to consider only the older population as vulnerable: Phenomena such as long COVID (ie, the onset of medical complications that last weeks to months after initial recovery) are increasing in younger age groups, including children and adolescents [ , ]. The most common symptoms of long COVID are fatigue, weakness, cough, chest tightness, breathlessness, palpitations, myalgia, and difficulty focusing; their appearance is not related to the severity of the COVID-19 course [ , ]. Moreover, new variants of concern—favored by the uncontrolled spread of the virus—continuously pose new threats to all age groups [ , , ]. In this regard, strategies such as vaccinations and nonpharmaceutical containment measures have been and continue to be fundamental to control COVID-19 diffusion, avoid hospital overcrowding, and slow down the epidemiological peaks [ - ]. Indeed, although this paper has provided evidence in favor of a high number of deaths due to COVID-19 in Italy during 2020 (before the administration of COVID-19 vaccines), lockdowns, social distancing, and masks have prevented the death toll from being numerous times higher [ - ].Limitations
Our approach has limitations to be considered. Since statistical significance is a measure of data compatibility with the null hypothesis (including the model’s assumptions), the evidence provided in this paper could vary under different initial hypotheses. However, the degree of uncertainty was reduced by targeting the tested hypotheses well. Furthermore, causal relationships have not been directly investigated. Therefore, these findings must be contextualized in light of the results of other literature. Finally, the discrepancies between the model predictions and the observed data were not weighted on the clinical characteristics of the patients.
Conclusions
This paper provides strong evidence on the absence of historical endogenous reasons capable of explaining the anomalous mortality increase recorded in Italy during 2020. Weighing these statistical results on the numerous molecular-genetic, medical, biological, virological, and epidemiological-based publications that confirmed high COVID-19 virulence, we conclude that the pandemic impact on excess deaths in Italy constitutes a scientific fact. This answers the question “Died from COVID or died with COVID?” Specifically, this manuscript can be adopted by health authorities and disclosure agencies to discredit fake news that minimizes the COVID-19 risk. Moreover, given the marked concordance between ARIMA and OLS regression models, we suggest that these methods be exploited for public health surveillance aims. In particular, considering their efficiency and effectiveness, it is possible to derive meaningful information regarding current and future epidemiological situations from the comparison between the predicted and observed trends.
Conflicts of Interest
None declared.
COVID-19 Mortality Italy, Rovetta.
DOCX File , 814 KB
Autoregressive integrated moving average (ARIMA) models.
ZIP File (Zip Archive), 1717 KBReferences
- Coronavirus (COVID-19) Dashboard. World Health Organization. URL: https://covid19.who.int/ [accessed 2021-12-11]
- Assessing Risk Factors for Severe COVID-19 Illness. Centers for Disease Control and Prevention. 2020 Nov 30. URL: https://www.cdc.gov/coronavirus/2019-ncov/covid-data/investigations-discovery/assessing-risk-factors.html [accessed 2021-12-11]
- Bourdrel T, Annesi-Maesano I, Alahmad B, Maesano CN, Bind M. The impact of outdoor air pollution on COVID-19: a review of evidence from , animal, and human studies. Eur Respir Rev 2021 Mar 31;30(159):1 [FREE Full text] [CrossRef] [Medline]
- Jabłońska K, Aballéa S, Toumi M. Factors influencing the COVID-19 daily deaths' peak across European countries. Public Health 2021 May;194:135-142 [FREE Full text] [CrossRef] [Medline]
- Bontempi E. First data analysis about possible COVID-19 virus airborne diffusion due to air particulate matter (PM): The case of Lombardy (Italy). Environ Res 2020 Jul;186:109639 [FREE Full text] [CrossRef] [Medline]
- Epidemia COVID-19: Aggiornamento nazionale 1 dicembre 2021 - ore 12. Istituto Superiore di Sanità. 2021 Dec 3. URL: https://www.epicentro.iss.it/coronavirus/bollettino/Bollettino-sorveglianza-integrata-COVID-19_1-dicembre-2021.pdf [accessed 2021-12-11]
- Meyerowitz-Katz G, Merone L. A systematic review and meta-analysis of published research data on COVID-19 infection fatality rates. Int J Infect Dis 2020 Dec;101:138-148 [FREE Full text] [CrossRef] [Medline]
- Rath S, Tripathy A, Tripathy AR. Prediction of new active cases of coronavirus disease (COVID-19) pandemic using multiple linear regression model. Diabetes Metab Syndr 2020;14(5):1467-1474 [FREE Full text] [CrossRef] [Medline]
- Peng Z, Ao S, Liu L, Bao S, Hu T, Wu H, et al. Estimating unreported COVID-19 cases with a time-varying SIR regression model. Int J Environ Res Public Health 2021 Jan 26;18(3):1 [FREE Full text] [CrossRef] [Medline]
- Melik-Huseynov DV, Karyakin NN, Blagonravova AS, Klimko VI, Bavrina AP, Drugova OV, et al. Regression models predicting the number of deaths from the new coronavirus infection. Sovrem Tekhnologii Med 2020;12(2):6-11 [FREE Full text] [CrossRef] [Medline]
- Abolmaali S, Shirzaei S. A comparative study of SIR model, linear regression, logistic function and ARIMA model for forecasting COVID-19 cases. AIMS Public Health 2021;8(4):598-613 [FREE Full text] [CrossRef] [Medline]
- Alabdulrazzaq H, Alenezi MN, Rawajfih Y, Alghannam BA, Al-Hassan AA, Al-Anzi FS. On the accuracy of ARIMA based prediction of COVID-19 spread. Results Phys 2021 Aug;27:104509 [FREE Full text] [CrossRef] [Medline]
- De Vogli R. Morti con o per il coronavirus? L'apice degli errori sul tema è stato raggiunto in questi giorni. Il Fatto Quotidiano. 2021 Nov 11. URL: https://www.ilfattoquotidiano.it/2021/10/25/morti-con-o-per-il-coronavirus-lapice-degli-errori-sul-tema-e-stato-raggiunto-in-questi-giorni/6366640/ [accessed 2021-11-11]
- Rovetta A, Castaldo L. Influence of mass media on Italian web users during the COVID-19 pandemic: infodemiological analysis. JMIRx Med 2021 Oct 18;2(4):e32233 [FREE Full text] [CrossRef] [Medline]
- Rovetta A. The impact of COVID-19 on conspiracy hypotheses and risk perception in Italy: infodemiological survey study using Google Trends. JMIR Infodemiology 2021;1(1):e29929 [FREE Full text] [CrossRef] [Medline]
- Tavole di mortalità. Istituto nazionale di statistica. URL: http://dati.istat.it/Index.aspx?DataSetCode=DCIS_MORTALITA1 [accessed 2021-11-01]
- C04: Sanità e salute. Istituto nazionale di statistica. URL: https://www.istat.it/it/files//2020/12/C04.pdf [accessed 2021-11-01]
- COVID-19. Osservatorio sulla salute. URL: https://www.osservatoriosullasalute.it/wp-content/uploads/2021/05/ro-2020-isc-covid.xlsx [accessed 2021-11-01]
- Popolazione per età, sesso e stato civile 2019. Tutti Italia. URL: https://www.tuttitalia.it/statistiche/popolazione-eta-sesso-stato-civile-2019/ [accessed 2021-11-01]
- Regioni italiane per densità. Tutti Italia. URL: https://www.tuttitalia.it/regioni/densita/ [accessed 2021-11-01]
- Multiple Linear Regression Calculator. Statistics Kingdom. URL: https://www.statskingdom.com/410multi_linear_regression.html [accessed 2021-11-03]
- Jafari M, Ansari-Pour N. Why, when and how to adjust your P values? Cell J 2019 Jan;20(4):604-607 [FREE Full text] [CrossRef] [Medline]
- Greenland S. Invited commentary: the need for cognitive science in methodology. Am J Epidemiol 2017 Sep 15;186(6):639-645. [CrossRef] [Medline]
- Greenland S, Hofman A. Multiple comparisons controversies are about context and costs, not frequentism versus Bayesianism. Eur J Epidemiol 2019 Sep;34(9):801-808 [FREE Full text] [CrossRef] [Medline]
- Bender R, Lange S. Multiple test procedures other than Bonferroni's deserve wider use. BMJ 1999 Feb 27;318(7183):600-601 [FREE Full text] [CrossRef] [Medline]
- Reito A. Problem of multiplicity in clinical studies and inferences made when it is present: letter to the editor. Am J Sports Med 2020 Jan;48(1):NP13. [CrossRef] [Medline]
- Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 2016 Apr 21;31(4):337-350 [FREE Full text] [CrossRef] [Medline]
- Amrhein V, Korner-Nievergelt F, Roth T. The earth is flat ( > 0.05): significance thresholds and the crisis of unreplicable research. PeerJ 2017;5:e3544 [FREE Full text] [CrossRef] [Medline]
- Brandt J. 2005 INS Presidential Address: neuropsychological crimes and misdemeanors. Clin Neuropsychol 2007 Jul;21(4):553-568. [CrossRef] [Medline]
- Siegel AF. Multiple t tests: some practical considerations. TESOL Quarterly 1990;24(4):773. [CrossRef]
- Holm S. A simple sequential rejective multiple test procedure. Scand J Statist 1979;6(2):65-70 [FREE Full text] [CrossRef]
- Impact Of Covid-19 Epidemic On Total Mortality Of Resident Population Year 2020. Istituto nazionale di statistica. 2021 Mar 05. URL: https://www.istat.it/it/files//2021/03/Report-_ISS_Istat__5-marzo-2021_en.pdf [accessed 2021-11-10]
- Scudellari M. How the coronavirus infects cells - and why Delta is so dangerous. Nature 2021 Jul;595(7869):640-644. [CrossRef] [Medline]
- Nguyen HL, Lan PD, Thai NQ, Nissley DA, O'Brien EP, Li MS. Does SARS-CoV-2 bind to human ACE2 more strongly than does SARS-CoV? J Phys Chem B 2020 Aug 27;124(34):7336-7347 [FREE Full text] [CrossRef] [Medline]
- Shang J, Ye G, Shi K, Wan Y, Luo C, Aihara H, et al. Structural basis of receptor recognition by SARS-CoV-2. Nature 2020 May;581(7807):221-224 [FREE Full text] [CrossRef] [Medline]
- Chen F, Zhang Y, Li X, Li W, Liu X, Xue X. The impact of ACE2 polymorphisms on COVID-19 disease: susceptibility, severity, and therapy. Front Cell Infect Microbiol 2021;11:753721 [FREE Full text] [CrossRef] [Medline]
- Cegolon L, Pichierri J, Mastrangelo G, Cinquetti S, Sotgiu G, Bellizzi S, et al. Hypothesis to explain the severe form of COVID-19 in Northern Italy. BMJ Glob Health 2020 Jun;5(6):1 [FREE Full text] [CrossRef] [Medline]
- Fortunato F, Martinelli D, Lo Caputo S, Santantonio T, Dattoli V, Lopalco PL, et al. Sex and gender differences in COVID-19: an Italian local register-based study. BMJ Open 2021 Oct 07;11(10):e051506 [FREE Full text] [CrossRef] [Medline]
- Fontal A, Bouma MJ, San-José A, López L, Pascual M, Rodó X. Climatic signatures in the different COVID-19 pandemic waves across both hemispheres. Nat Comput Sci 2021 Oct 21;1(10):655-665. [CrossRef]
- Christophi CA, Sotos-Prieto M, Lan F, Delgado-Velandia M, Efthymiou V, Gaviola GC, et al. Ambient temperature and subsequent COVID-19 mortality in the OECD countries and individual United States. Sci Rep 2021 Apr 22;11(1):8710 [FREE Full text] [CrossRef] [Medline]
- Chen S, Prettner K, Kuhn M, Geldsetzer P, Wang C, Bärnighausen T, et al. Climate and the spread of COVID-19. Sci Rep 2021 Apr 27;11(1):9042 [FREE Full text] [CrossRef] [Medline]
- Sera F, Armstrong B, Abbott S, Meakin S, O'Reilly K, von Borries R, MCC Collaborative Research Network, CMMID COVID-19 Working Group, et al. A cross-sectional analysis of meteorological factors and SARS-CoV-2 transmission in 409 cities across 26 countries. Nat Commun 2021 Oct 13;12(1):5968 [FREE Full text] [CrossRef] [Medline]
- Fares A. Factors influencing the seasonal patterns of infectious diseases. Int J Prev Med 2013 Feb;4(2):128-132 [FREE Full text] [Medline]
- Mourtzoukou EG, Falagas ME. Exposure to cold and respiratory tract infections. Int J Tuberc Lung Dis 2007 Sep;11(9):938-943. [Medline]
- Rovetta A, Castaldo L. Relationships between demographic, geographic, and environmental statistics and the spread of novel coronavirus disease (COVID-19) in Italy. Cureus 2020 Nov 09;12(11):e11397 [FREE Full text] [CrossRef] [Medline]
- Pegoraro V, Heiman F, Levante A, Urbinati D, Peduto I. An Italian individual-level data study investigating on the association between air pollution exposure and Covid-19 severity in primary-care setting. BMC Public Health 2021 May 12;21(1):902 [FREE Full text] [CrossRef] [Medline]
- Kasioumi M, Stengos T. The effect of pollution on the spread of COVID-19 in Europe. Econ Disaster Clim Chang 2021 Oct 22:1-12 [FREE Full text] [CrossRef] [Medline]
- Pluchino A, Biondo AE, Giuffrida N, Inturri G, Latora V, Le Moli R, et al. A novel methodology for epidemic risk assessment of COVID-19 outbreak. Sci Rep 2021 Mar 05;11(1):5304 [FREE Full text] [CrossRef] [Medline]
- Raimondi F, Novelli L, Ghirardi A, Russo FM, Pellegrini D, Biza R, HPG23 Covid-19 Study Group. Covid-19 and gender: lower rate but same mortality of severe disease in women-an observational study. BMC Pulm Med 2021 Mar 20;21(1):96 [FREE Full text] [CrossRef] [Medline]
- Galasso V, Pons V, Profeta P, Becher M, Brouard S, Foucault M. Gender differences in COVID-19 attitudes and behavior: Panel evidence from eight countries. Proc Natl Acad Sci U S A 2020 Nov 03;117(44):27285-27291 [FREE Full text] [CrossRef] [Medline]
- L'uso E L'abuso Di Alcol In Italia. Istituto nazionale di statistica. 2015 Apr 16. URL: https://www.istat.it/it/files//2015/04/statistica_report_alcol_2014.pdf [accessed 2022-04-02]
- Aspetti della vita quotidiana: Abitudine al fumo - età, titolo di studio. Istituto nazionale di statistica. URL: http://dati.istat.it/Index.aspx?QueryId=15513 [accessed 2022-04-02]
- Bwire GM. Coronavirus: why men are more vulnerable to Covid-19 than women? SN Compr Clin Med 2020 Jun 4;2(7):874-876 [FREE Full text] [CrossRef] [Medline]
- Dai M, Tao L, Chen Z, Tian Z, Guo X, Allen-Gipson DS, et al. Influence of cigarettes and alcohol on the severity and death of COVID-19: a multicenter retrospective study in Wuhan, China. Front Physiol 2020;11:588553 [FREE Full text] [CrossRef] [Medline]
- Zhang H, Ma S, Han T, Qu G, Cheng C, Uy JP, et al. Association of smoking history with severe and critical outcomes in COVID-19 patients: A systemic review and meta-analysis. Eur J Integr Med 2021 Apr;43:101313 [FREE Full text] [CrossRef] [Medline]
- Patanavanich R, Glantz SA. Smoking is associated with worse outcomes of COVID-19 particularly among younger adults: a systematic review and meta-analysis. BMC Public Health 2021 Aug 16;21(1):1554 [FREE Full text] [CrossRef] [Medline]
- Wehbe Z, Hammoud SH, Yassine HM, Fardoun M, El-Yazbi AF, Eid AH. Molecular and biological mechanisms underlying gender differences in COVID-19 severity and mortality. Front Immunol 2021;12:659339 [FREE Full text] [CrossRef] [Medline]
- Hachim IY, Hachim MY, Talaat IM, López-Ozuna VM, Saheb Sharif-Askari N, Al Heialy S, et al. The molecular basis of gender variations in mortality rates associated with the novel coronavirus (COVID-19) outbreak. Front Mol Biosci 2021;8:728409 [FREE Full text] [CrossRef] [Medline]
- Ng WH, Tipih T, Makoah NA, Vermeulen J, Goedhals D, Sempa JB, et al. Comorbidities in SARS-CoV-2 patients: a systematic review and meta-analysis. mBio 2021 Feb 09;12(1):1 [FREE Full text] [CrossRef] [Medline]
- Gülsen A, König IR, Jappe U, Drömann D. Effect of comorbid pulmonary disease on the severity of COVID-19: A systematic review and meta-analysis. Respirology 2021 Jun;26(6):552-565 [FREE Full text] [CrossRef] [Medline]
- COVID-19: People with Certain Medical Conditions. Centers for Disease Control and Prevention. URL: https://www.cdc.gov/coronavirus/2019-ncov/need-extra-precautions/people-with-medical-conditions.html [accessed 2021-11-11]
- Lopez-Leon S, Wegman-Ostrosky T, Perelman C, Sepulveda R, Rebolledo PA, Cuapio A, et al. More than 50 long-term effects of COVID-19: a systematic review and meta-analysis. Sci Rep 2021 Aug 09;11(1):16144 [FREE Full text] [CrossRef] [Medline]
- Asadi-Pooya AA, Nemati H, Shahisavandi M, Akbari A, Emami A, Lotfi M, et al. Long COVID in children and adolescents. World J Pediatr 2021 Oct;17(5):495-499 [FREE Full text] [CrossRef] [Medline]
- Raveendran AV, Jayadevan R, Sashidharan S. Long COVID: An overview. Diabetes Metab Syndr 2021;15(3):869-875 [FREE Full text] [CrossRef] [Medline]
- Miyakawa K, Jeremiah SS, Kato H, Yamaoka Y, Go H, Yajima S, et al. Rapid detection of neutralizing antibodies to SARS-CoV-2 variants in post-vaccination sera. J Mol Cell Biol 2022 Jan 29;13(12):918-920 [FREE Full text] [CrossRef] [Medline]
- Choi JY, Smith DM. SARS-CoV-2 variants of concern. Yonsei Med J 2021 Nov;62(11):961-968 [FREE Full text] [CrossRef] [Medline]
- Lopez Bernal J, Andrews N, Gower C, Gallagher E, Simmons R, Thelwall S, et al. Effectiveness of Covid-19 vaccines against the B.1.617.2 (Delta) variant. N Engl J Med 2021 Aug 12;385(7):585-594. [CrossRef]
- Bian L, Gao Q, Gao F, Wang Q, He Q, Wu X, et al. Impact of the Delta variant on vaccine efficacy and response strategies. Expert Rev Vaccines 2021 Oct;20(10):1201-1209 [FREE Full text] [CrossRef] [Medline]
- Fan Y, Chan K, Hung IF. Safety and efficacy of COVID-19 vaccines: a systematic review and meta-analysis of different vaccines at phase 3. Vaccines (Basel) 2021 Sep 04;9(9):1 [FREE Full text] [CrossRef] [Medline]
- Pozzetto B, Legros V, Djebali S, Barateau V, Guibert N, Villard M, Covid-Ser study group, et al. Immunogenicity and efficacy of heterologous ChAdOx1-BNT162b2 vaccination. Nature 2021 Dec;600(7890):701-706. [CrossRef] [Medline]
- Epidemia COVID-19: Aggiornamento nazionale 3 novembre 2021 - ore 12. Istituto Superiore di Sanità. 2021 Nov 5. URL: https://www.epicentro.iss.it/coronavirus/bollettino/Bollettino-sorveglianza-integrata-COVID-19_3-novembre-2021.pdf [accessed 2021-11-11]
- Grosso FM, Presanis AM, Kunzmann K, Jackson C, Corbella A, Grasselli G, Covid-19 Lombardy Working Group. Decreasing hospital burden of COVID-19 during the first wave in Regione Lombardia: an emergency measures context. BMC Public Health 2021 Sep 03;21(1):1612 [FREE Full text] [CrossRef] [Medline]
- Safety of COVID-19 vaccines. European Medicines Agency. URL: https://www.ema.europa.eu/en/human-regulatory/overview/public-health-threats/coronavirus-disease-covid-19/treatments-vaccines/vaccines-covid-19/safety-covid-19-vaccines [accessed 2021-11-11]
- Xylogiannopoulos KF, Karampelas P, Alhajj R. COVID-19 pandemic spread against countries' non-pharmaceutical interventions responses: a data-mining driven comparative study. BMC Public Health 2021 Sep 01;21(1):1607 [FREE Full text] [CrossRef] [Medline]
- Askitas N, Tatsiramos K, Verheyden B. Estimating worldwide effects of non-pharmaceutical interventions on COVID-19 incidence and population mobility patterns using a multiple-event study. Sci Rep 2021 Jan 21;11(1):1972 [FREE Full text] [CrossRef] [Medline]
- Riccardo F, Ajelli M, Andrianou XD, Bella A, Del Manso M, Fabiani M, COVID-19 working group. Epidemiological characteristics of COVID-19 cases and estimates of the reproductive numbers 1 month into the epidemic, Italy, 28 January to 31 March 2020. Euro Surveill 2020 Dec;25(49):1 [FREE Full text] [CrossRef] [Medline]
- Alfano V, Ercolano S. The Efficacy of lockdown against COVID-19: a cross-country panel analysis. Appl Health Econ Health Policy 2020 Aug 03;18(4):509-517 [FREE Full text] [CrossRef] [Medline]
Abbreviations
ARIMA: autoregressive integrated moving average |
ISS: National Institute of Health |
ISTAT: National Institute of Statistics |
OLS: ordinary least squares |
SARIMA: ARIMA + seasonal component |
SIR: Susceptible-Infected-Recovered |
Edited by T Sanchez, A Mavragani; submitted 28.12.21; peer-reviewed by K Nagar, H Nguyen; comments to author 25.01.22; revised version received 31.01.22; accepted 03.03.22; published 07.04.22
Copyright©Alessandro Rovetta, Akshaya Srikanth Bhagavathula. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 07.04.2022.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included.