Prediction of the COVID-19 Pandemic for the Top 15 Affected Countries: Advanced Autoregressive Integrated Moving Average (ARIMA) Model

Background: The coronavirus disease (COVID-19) pandemic has affected more than 200 countries and has infected more than 2,800,000 people as of April 24, 2020. It was first identified in Wuhan City in China in December 2019. Objective: The aim of this study is to identify the top 15 countries with spatial mapping of the confirmed cases. A comparison was done between the identified top 15 countries for confirmed cases, deaths, and recoveries, and an advanced autoregressive integrated moving average (ARIMA) model was used for predicting the COVID-19 disease spread trajectories for the next 2 months. Methods: The comparison of recent cumulative and predicted cases was done for the top 15 countries with confirmed cases, deaths, and recoveries from COVID-19. The spatial map is useful to identify the intensity of COVID-19 infections in the top 15 countries and the continents. The recent reported data for confirmed cases, deaths, and recoveries for the last 3 months was represented and compared between the top 15 infected countries. The advanced ARIMA model was used for predicting future data based on time series data. The ARIMA model provides a weight to past values and error values to correct the model prediction, so it is better than other basic regression and exponential methods. The comparison of recent cumulative and predicted cases was done for the top 15 countries with confirmed cases, deaths, and recoveries from COVID-19. Results: The top 15 countries with a high number of confirmed cases were stratified to include the data in a mathematical model. The identified top 15 countries with cumulative cases, deaths, and recoveries from COVID-19 were compared. The United States, the United Kingdom, Turkey, China, and Russia saw a relatively fast spread of the disease. There was a fast recovery ratio in JMIR Public Health Surveill 2020 | vol. 6 | iss. 2 | e19115 | p. 1 http://publichealth.jmir.org/2020/2/e19115/ (page number not for citation purposes) Singh et al JMIR PUBLIC HEALTH AND SURVEILLANCE


Background
At the World Health International Conference in Geneva in January 2020, the World Health Organization (WHO) announced an outbreak of the new coronavirus. The novel coronavirus (severe acute respiratory syndrome [SARS] coronavirus 2) from Wuhan, China has continued to spread around the world since January 2020 and has turned into a pandemic of the coronavirus disease (COVID-19) [1,2]. Due to the rapid spreading potential and the absence of vaccines and drugs, the contagious COVID-19 devastated normal life around the world. Currently, COVID-19 has infected more than half a million of the population, has killed more than 25,000 people, and has forced more than 3 billion to stay in their homes [3]. Many people started getting pneumonia without any reason, and most of the cases were linked to Wuhan Seafood Market, where they sell fish and trade live animals. The new coronaviruses lurking around the world are threatening our rule, and the prevalence of fear and panic is increasing. This has also affected the cryptocurrency market [4,5]. The country in which the coronavirus has caused the most devastation after China is Italy. In Italy, hundreds of people are dying every day due to this deadly virus. The corona virus is 900 times smaller than a human hair. Despite its size, this small virus has scared the whole world. In December 2019, the first case of COVID-19 came from Wuhan City in China [6][7][8].
During the Chinese New Year migration, the virus spread to other Chinese provinces in early and mid-January 2020. The WHO [3] revealed that cases began to be detected in other countries by international travelers. Due to a lack of knowledge about this virus, the COVID-19 pandemic placed tremendous strain on everyone around the world. To prevent further transmission, strong preventive measures have intensified week-to-week; however, the numbers of infected cases are consistently increasing around the world, even after undergoing lockdown. Mathematical approaches have been widely used to infer critical epidemiological transitions and parameters of COVID-19. Epidemic curve fitting, surveillance data during the early transmission, and other epidemic models have been frequently applied to generate forecasts of the COVID-19 pandemic across the world [9][10][11]. This study aims to identify the top 15 countries with the most confirmed cases with spatial mapping. A comparison was done between the identified top 15 countries for confirmed cases, deaths, and recoveries, and an advanced autoregressive integrated moving average (ARIMA) model was used for predicting the spread of COVID-19 trajectories for the next 2 months (until July 7, 2020).

Study Area
Various studies have been presented for forecasting many epidemic diseases. This research study analyzes dynamic models to generate 20-day forecasts of cumulative confirmed deaths and recoveries from COVID-19 cases by country, territory, or conveyance generated on April 24, 2020. The United States, Spain, Italy, France, Germany, the United Kingdom, Turkey, Iran, China, Russia, Brazil, Canada, Belgium, the Netherlands, and Switzerland were taken from the top 20 countries based on cumulative effect data. The ARIMA model assigns a weight to the considered past values and an error value to correct the modelling; other basic regression and exponential models use all past values to predict future values, so the ARIMA model is preferred. This study analyzed and extracted worldwide data based upon a time series data-based advanced prediction ARIMA model approach for the top 15 COVID-19-infected countries.

Data
We used data from Worldometer, which reports the approximate data of cumulative cases for more than 170 countries worldwide including state-or province-level cases for some countries [12]. We have collected the case data for each day at given stipulated times, from January 21, 2020, to April 24, 2020. Furthermore, we preprocessed the top 15 countries' data with their spatial locations to collect and create some spatial attributes for the overall available data sets to forecast the trajectory of COVID-19 cases. In addition, as whole worldwide data is not available for stipulated times, we did not create any worldwide pandemic forecasting. Some dates with cases of confirmed COVID-19 along with total cumulative results of recovered cases and death cases were analyzed using statistical analysis along with spatial extinct. We used the ARIMA model with R (R Foundation for Statistical Computing) and validated it using Akaike information criterion (AIC). The new projected data was used up to July 2, 2020, for the creation of a trajectory projected score for each category: case confirmed, recovered, and death.
Recent reported cumulative data of confirmed cases, deaths, and recoveries of COVID-19 from January 21 to April 26, 2020, were obtained from Worldometer. The reported data were used to predict more than 60 days and to understand the positive effects in the near future as well as the projected trends over trajectories. The different statistical phenomenological models in the R-language platform were used to analyze the disease-based trajectories model for prediction purposes. The four models were used to analyze the aggregate data set for time series analysis. This includes the ARIMA model, which is a mass model of two different models, including the autoregression (AR) model and the moving average (MA) model [13]. This model also used AIC statistics and coverage of regression analysis.
Another type of COVID-19, like SARS disease, was analyzed without breaking the current situation or predicting the future perspective [14]. The vector auto-average model was used to predict the spatial extinct while using remote sensing data for the purpose of the creation of a worldwide geographic information system (GIS) map for three different variables [15]. These three variables in the GIS environment created a map of cumulative confirmed cases by country as well as recovered and death maps [16]. The use of another statistical analysis was a generalized logistic growth model, which generally is depicted as a scaling parameter for integrating an additional result-oriented value put method [17]. Some epidemic models used in disease epidemic conditions measure oscillates, which are multiple peak parameters inferred in subepidemic and pandemic conditions to determine the projected outcomes [18].
After standardizing all the models, the data of the top 20 countries were included to analyze the forecasting models of differential spatial adjacent and projected trajectories, which were analyzed up to July 2, 2020. We used the GIS and remote sensing to determine the pandemic mapping and analyze the upcoming effects of COVID-19.

ARIMA
MA is the present value of a series, which is defined as a linear combination of past errors. Assuming the errors to be independently distributed with the normal distribution [13,19], order q is defined as: y t = c + ε t + θ 1 y t-1 + θ 2 y t-2 + ….….….….… + θ q y t-q (1) Where: • ε t =white noise • y t-1 and y t-2 =lags Order q of the MA process is obtained from the autocorrelation function (ACF) plot; this is the lag after which ACF crosses the upper confidence interval for the first time. We combined differencing with MA and AR models, and the combined model can be expressed as: y′ t = c + ϕ 1 y′ t-1 + ϕ 2 y′ t-2 + ... + ϕ p y′ t-p + θ 1 y t-1 + θ 2 y t-2 + ….… + θ q y t-q + ε t (2) Here, y′ t is the differenced series. The "predictors" on the right-hand side include both lagged values of y t and lagged errors. We call this an ARIMA (p, d, q) model, where:

Results
The top 15 countries were identified using mapping of cumulative confirmed COVID-19 cases from January to April 24, 2020, for 200 nations as presented in Figure 1. The top 15 countries with a high number of confirmed cases were stratified to include the data in a mathematical model. The top 15 countries' (the United States, Spain, Italy, France, Germany, the United Kingdom, Turkey, Iran, China, Russia, Brazil, Canada, Belgium, the Netherlands, and Switzerland) cumulative cases, deaths, and recoveries from COVID-19 were compared in Figure 2. The United States, The United Kingdom, Turkey, China, and Russia saw a relatively fast spread of the disease. There was a fast recovery ratio in China, Switzerland, Germany, Iran, and Brazil, but a slow recovery ratio in the United States, the United Kingdom, the Netherlands, Russia, and Italy as shown in Figure 2. In addition, there were higher death rate ratios in Italy and the United Kingdom, and lower death rate ratios in Russia, Turkey, China, and the United States ( Figure  2).  Furthermore, data smoothening was applied to stabilize the data by removing changes in the level of a time series and, therefore, eliminating (or reducing) the trend and seasonality. After this, the forecast prediction model was applied by using AR and MA models to generate plots of the different trends in upcoming days. The ARIMA model was validated for the available current data using the AIC value; it estimates that the out-of-sample prediction error and lowest value are preferable. Its value were around 20, 14, and 16 for cumulative confirmed cases, deaths, and recoveries from COVID-19, respectively, which represents less error. The outcome of these predictions is presented in Figure 3. Our findings revealed linearity in the confirmed cumulative cases and showed a rapid exponential growth phase in the world, which might occur roughly from April 8 to April 24, 2020, when the number of COVID-19 cases may rise steeply to nearly 1 million in the United States, 220,000 in Spain, 200,000 in Italy, 180,000 in France, and 190,000 in Germany. Other countries that have a smaller number of cases but show JMIR Public Health Surveill 2020 | vol. 6 | iss. 2 | e19115 | p. 5 http://publichealth.jmir.org/2020/2/e19115/ (page number not for citation purposes) a declining upward trend include Switzerland, Germany, and Italy ( Figure 2). However, the cases of COVID-19 in China remain stable (Figure 2). The ARIMA model predicted confirmed cases, deaths, and recoveries for the next month from April 24 to July 7, 2020, using the past 3 months of data in Figure 3 (cyan color), Figure 4 (brown color), and Figure 5 (green color) with 95% confidence intervals. Along with the 95% confidence predicted line after April 24, the 80% and 70% confidence wide values are shown in light grey and light-yellow colors, respectively. The wide confidence intervals help to manage any sudden changes in the prediction of dynamic COVID-19 cases.
During the next 2 months between April 24 and July 7, 2020, the model predicted that the confirmed cases, deaths, and recoveries would be doubled in all countries except China, Switzerland, and Germany (Figures 3-5). It was also observed that the death and recovery rates will be faster when compared to confirmed cases during the next 2 months. The associated mortality rate will be much higher in the United States, Spain, and Italy followed by France, Germany, and the United Kingdom. The recovery rates will stay slow at first but then rapidly increase in the United States, Italy, Germany, and France by the end of June 2020 ( Figure 5).

Principal Findings
The COVID-19 daily data was collected and cumulatively represented as a spatial map for more than 170 countries and territories. The spatial map is useful to identify the intensity of COVID-19 infections in the top 15 countries and the continents. The recent reported data for confirmed cases, deaths, and recoveries for the last 3 months from January to April 2020 was represented and compared between the top 15 infected countries. The ARIMA model was used to predict estimated confirmed cases, deaths, and recoveries for the top 15 countries from April 24 to July 7, 2020. Its value was represented with 95%, 80%, and 70% confidence intervals, and the 95% confidence intervals were shown as the median interval between the 80% and 70% wide values. The validation of the ARIMA model was carried out using the AIC for the available recent data; its values were about 20, 14, and 16 for cumulative confirmed cases, deaths, and recoveries from COVID-19, respectively, which represents acceptable results. The observed predicted values showed that the confirmed cases, deaths, and recoveries will double in all countries except China, Switzerland, and Germany. It was also observed that the death and recovery rates were faster when compared to confirmed cases during the next 2 months. The associated mortality rate will be much higher in the United States, Spain, and Italy followed by France, Germany, and the United Kingdom. The limitation of the ARIMA model is that it does not support any volatility or in-between changes in the prediction periods. The accuracy of the countries' data accumulated from Worldometer was a matter of trust for the representation of the whole study.
The forecast analysis of COVID-19 dynamics showed a different angle for the whole world, and it looks scarier than imagined. Interestingly, the recovery numbers also look promising, with resistance starting by July 2020. Thus, a slowdown in the surge of the COVID-19 pandemic during the proceeding months depends upon various administrative interventions and public awareness about the spread of the COVID-19 pandemic.