This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.
The highly infectious coronavirus disease (COVID-19) was first detected in Wuhan, China in December 2019 and subsequently spread to 212 countries and territories around the world, infecting millions of people. In India, a large country of about 1.3 billion people, the disease was first detected on January 30, 2020, in a student returning from Wuhan. The total number of confirmed infections in India as of May 3, 2020, is more than 37,000 and is currently growing fast.
Most of the prior research and media coverage focused on the number of infections in the entire country. However, given the size and diversity of India, it is important to look at the spread of the disease in each state separately, wherein the situations are quite different. In this paper, we aim to analyze data on the number of infected people in each Indian state (restricted to only those states with enough data for prediction) and predict the number of infections for that state in the next 30 days. We hope that such statewise predictions would help the state governments better channelize their limited health care resources.
Since predictions from any one model can potentially be misleading, we considered three growth models, namely, the logistic, the exponential, and the susceptible-infectious-susceptible models, and finally developed a data-driven ensemble of predictions from the logistic and the exponential models using functions of the model-free maximum daily infection rate (DIR) over the last 2 weeks (a measure of recent trend) as weights. The DIR is used to measure the success of the nationwide lockdown. We jointly interpreted the results from all models along with the recent DIR values for each state and categorized the states as severe, moderate, or controlled.
We found that 7 states, namely, Maharashtra, Delhi, Gujarat, Madhya Pradesh, Andhra Pradesh, Uttar Pradesh, and West Bengal are in the severe category. Among the remaining states, Tamil Nadu, Rajasthan, Punjab, and Bihar are in the moderate category, whereas Kerala, Haryana, Jammu and Kashmir, Karnataka, and Telangana are in the controlled category. We also tabulated actual predicted numbers from various models for each state. All the
States with nondecreasing DIR values need to immediately ramp up the preventive measures to combat the COVID-19 pandemic. On the other hand, the states with decreasing DIR can maintain the same status to see the DIR slowly become zero or negative for a consecutive 14 days to be able to declare the end of the pandemic.
The world is now facing an unprecedented crisis due to the novel coronavirus, first detected in Wuhan, China in December 2019 [
The WHO declared the coronavirus disease (COVID-19) as a global pandemic on March 11, 2020 [
Bar chart of daily infected cases (blue) in India. Red bar denotes death. The black curve is a fitted smooth curve on the daily cases.
There are four stages of COVID-19 depending on the types of virus transmission [
Many news agencies are repeatedly saying or questioning whether India is now at stage 3 [
In this paper, we first discuss the importance of statewise consideration, contemplating all the states together. Second, we will focus on the infected people in each state (considering only those states with enough data for prediction) and build growth models to predict infected people for that state in the next 30 days.
India is a vast country with a geographic area of 3,287,240 square kilometers and a total population of about 1.3 billion [
In
When the first case in each state happened with their travel histories. UAE: United Arab Emirates.
Cumulative number of infected people over time in states with at least 10 infected cases.
In
Health screenings at airports and border crossings
Introduction of quarantine policies: gradually for passengers coming from different countries
Visa restrictions: gradually for different countries
Limit public gatherings (closure of some selected public institutions like museums, religious places, and postponing of several local elections to stop public gatherings)
Border checks
Border closure
Limit public gatherings (ban on all sorts of public gatherings and meetings, and stopping people from making any congregation)
Travel restrictions
Testing for the coronavirus disease (before this point, only people who had traveled from abroad were tested; this point onwards, testing was also introduced for symptomatic contacts of laboratory-confirmed cases, symptomatic health care workers, and all hospitalized patients with severe acute respiratory illness)
Flight suspensions
Cancellation of passenger train services until March 31, 2020
Suspension of domestic airplane operations
21-day lockdown of entire country
Cancellation of passenger train services extended to April 14, 2020
Increase of quarantine/isolation facilities
Extension of lockdown until May 3, 2020
Extension of lockdown until May 17, 2020
We have used Indian COVID-19 data available publicly. The three primary sources of the data are the Ministry of Health and Family Welfare, India [
In this paper, we consider the exponential model, the logistic model, and the susceptible-infectious-susceptible (SIS) model for COVID-19 pandemic prediction at the state level. These models have already been used to predict epidemics like COVID-19 around the world, including in China, and for the Ebola outbreak in Bomi, Liberia in 2014 [
The previously mentioned three models will provide a different prediction perspective for each state. The exponential model–based prediction will give a picture of what could be the cumulative number of infected people in the next 30 days if we do not take any preventive measures. We can consider the forecast from the exponential model as an estimate of the upper bound of the total number of infected people in the next 30 days. The logistic model–based prediction will capture the effect of preventive measures that have already been taken by the respective state governments as well as the central government. The logistic model assumes that the infection rate will slow down in the future with an overall “S” type growth curve. In other words, the logistic model tries to explore a situation where there is a full lockdown in the country, leading to an extreme restraint on the people’s movement, hence reducing the rate of infection considerably. Under the effective implementation of the lockdown, it is appropriate to use a logistic model. In this scenario, many people have already been infected; the virus may find it hard to spot more susceptible people. Thus, the virus slows down its spread, causing the flattening in the S-curve at a later stage. Several research papers have used the logistic model in the context of COVID-19 [
The purpose of the SIS model is to reflect the effect of the major preventive measure like the nationwide 21-day lockdown from March 25 to April 14, 2020. The lockdown was extended in two phases: (1) until May 3 and (2) then until May 17, 2020, with some relaxation [
Kumar et al [
The DIR takes a positive value when we see an increase in active COVID-19 cases from yesterday, the zero value in case of no change in the number of active cases from yesterday, and a negative value when the total number of active cases decreases from the previous day. A DIR value can be more than 1 also, particularly during initial days of infection in a state. For example, when the total number of active cases increases from 5 yesterday to 20 today, then the DIR value is (20 – 5) / 5 = 3. The visual trends in infection rates can explain whether the COVID-19 situation is under control or not in a specific state. A state where DIRs are declining for the last few days indicates that the situation is improving. However, a certain jump in infection rates could inform us that there could be cases of COVID-19 that are underreported. We need to search for infected clusters as quickly as possible.
India implemented a nationwide lockdown on March 25, 2020. We first considered the incubation period of the novel coronavirus to study the effect of the lockdown. The incubation period of an infectious disease is defined as the time between infection and the first appearance of signs and symptoms [
In this section, we depend on inputs from the exponential, logistic, and SIS models along with DIRs for each state. Remembering the words of the famous statistician George Box “All models are wrong, but some are useful,” we interpreted the results from different models jointly. We consider different states with at least 300 cumulative infected cases. For each state, we present four graphs. We have used the state-level data until May 1, 2020. The first and second graphs are based on the logistic and the exponential models, respectively, with the next 30-day predictions. The third graph is the plot of DIRs for a state. Finally, the fourth graph is showing the growth of the active infected patients using SIS model prediction (
Data-driven assessment and 30-day prediction using the logistic and exponential models, and their linear combination.
State | Observed cumulative cases (May 1, 2020) | Maximum DIRa in the last 2 weeks | Estimated R0b from SISc model (data until May 1, 2020) | Data driven assessment of COVID-19d situation | 30-day prediction (May 31, 2020) | Observed cumulative cases (May 31, 2020) | Assessment of observed cumulative cases with respect to (LCprede, exponential) | ||
Logistic | Linear combination of logistic and exponential (LCpred) | Exponential (applicable only if the situation is severe) | |||||||
Andhra Pradesh | 1463 | 0.17 | 3.22 | Severe | 2313 | 4725 | 16,502 | 3571 | Below |
Bihar | 426 | 0.39 | 3.08 | Moderate | 16,452 | 16,472 | 16,502 | 3807 | Below |
Delhi | 3515 | 0.17 | 2.94 | Severe | 4262 | 9650 | 35,957 | 19,844 | Between |
Gujarat | 4395 | 0.27 | 3.50 | Severe | 5206 | 33,736 | 110,874 | 16,794 | Below |
Haryana | 313 | 0.18 | 1.82 | Controlled | 321 | 590 | 1815 | 2091 | Above |
Jammu and Kashmir | 614 | 0.09 | 2.66 | Controlled | 724 | 1124 | 5170 | 2446 | Between |
Karnataka | 576 | 0.06 | 2.38 | Controlled | 3711 | 3711 | 3713 | 3221 | Below |
Kerala | 497 | 0.18 | 1.96 | Controlled | 455 | 740 | 2040 | 1270 | Between |
Madhya Pradesh | 2719 | 0.10 | 3.36 | Severe | 3030 | 6521 | 37,935 | 8089 | Between |
Maharashtra | 10,498 | 0.15 | 3.50 | Severe | 17,115 | 43,963 | 196,103 | 67,655 | Between |
Punjab | 357 | 0.14 | 2.52 | Moderate | 419 | 713 | 2517 | 2263 | Between |
Rajasthan | 2584 | 0.12 | 2.94 | Moderate | 2821 | 6125 | 30,356 | 8831 | Between |
Tamil Nadu | 2323 | 0.12 | 3.22 | Moderate | 2241 | 3967 | 16,624 | 22,333 | Above |
Telangana | 1039 | 0.09 | 2.66 | Controlled | 1063 | 1631 | 7373 | 2698 | Between |
Uttar Pradesh | 2281 | 0.13 | 2.52 | Severe | 3016 | 6566 | 30,326 | 8075 | Between |
West Bengal | 795 | 0.17 | 3.22 | Severe | 1261 | 3225 | 12,815 | 5501 | Between |
aDIR: daily infection rate.
bR0: basic reproduction number.
cSIS: susceptible-infectious-susceptible.
dCOVID-19: coronavirus disease.
eLCpred: linear combination prediction.
The situation in Maharashtra is currently very severe with respect to the active number of cases (see
Graphs for the state of Maharashtra. SIS: susceptible-infectious-susceptible.
Delhi, being a state of high population density, has already observed 3515 confirmed COVID-19 cases (see
Graphs for the state of Delhi. SIS: susceptible-infectious-susceptible.
The cumulative infected cases in Tamil Nadu is 2323 (see
Graphs for the state of Tamil Nadu. SIS: susceptible-infectious-susceptible.
This state currently has 2719 cumulative COVID-19 cases (see
Graphs for the state of Madhya Pradesh. SIS: susceptible-infectious-susceptible.
The western state of India, Rajasthan, reported 2584 cumulative infected COVID-19 cases (see
Graphs for the state of Rajasthan. SIS: susceptible-infectious-susceptible.
The state is currently experiencing exponential growth with 4395 as the cumulative number of COVID-19 cases (see
Graphs for the state of Gujarat. SIS: susceptible-infectious-susceptible.
This northern state of India has experienced 2281 cumulative COVID-19 cases (see
Graphs for the state of Uttar Pradesh. SIS: susceptible-infectious-susceptible.
The southern Indian state of Telangana has reported 1039 cumulative infected cases until now (see
Graphs for the state of Telangana. SIS: susceptible-infectious-susceptible.
This state has observed 1463 confirmed cumulative infected cases so far (see
Graphs for the state of Andhra Pradesh. SIS: susceptible-infectious-susceptible.
The southern state of Kerala is one of the few states of India where the effect of the lockdown is observed strongly. The state reported the first COVID-19 case in India. However, Kerala has been able to control the spread of the virus to a large extent to date. The cumulative number of cases reported until now is 497 (see
Graphs for the state of Kerala. SIS: susceptible-infectious-susceptible.
The state has managed to restrict the cumulative infected cases to 576 until now (see
Graphs for the state of Karnataka. SIS: susceptible-infectious-susceptible.
The northernmost state of Jammu and Kashmir has seen 614 cumulative infected cases so far (see
Graphs for the state of Jammu and Kashmir. SIS: susceptible-infectious-susceptible.
The state of West Bengal is standing at 795 cumulative infected cases as of now (see
Graphs for the state of West Bengal. SIS: susceptible-infectious-susceptible.
The state of Haryana has observed 313 cumulative infected COVID-19 cases so far (see
Graphs for the state of Haryana. SIS: susceptible-infectious-susceptible.
The state of Punjab has reported 357 cumulative infected cases until now (see
Graphs for the state of Punjab. SIS: susceptible-infectious-susceptible.
The state has reported 426 cumulative infected cases until now (see
Graphs for the state of Bihar. SIS: susceptible-infectious-susceptible.
We consider a data-driven assessment of the COVID-19 situation based on the growth of active cases in recent times (red line, fourth panel in each state plot) along with the DIR values for each state (see
Such a choice of the tuning parameter λ makes the LCpred equal to the logistic prediction when DIRmax is negative with λ=0. On the other hand, the LCpred is equal to the exponential prediction when DIRmax is more than 1 with λ=1. When DIRmax is in between 0 and 1, the LCpred is a combination of the predictions from the logistic and the exponential models. Given the situation in the entirety India, we recommend LCpred along with the exponential predictions (particularly for states in severe condition) to be used for assessment purposes in each state.
Extensive testing may not be logistically feasible given India’s large population and limited health care budget. The undertesting can significantly impact the logistic prediction and less so the exponential prediction since the first one is underforecasting and the second one is overforecasting. The DIR indirectly captures the undertesting phenomenon. Thus, the LCpred with (a truncated version of) DIR as the weight (λ) can be thought of as a treatment for undertesting, albeit in a limited fashion.
From
India, a country of approximately 1.3 billion people, has reported 17,615 confirmed COVID-19 cases after 80 days (from January 30, 2020) from the first reported case in Kerala [
Note that India may have seen fewer COVID-19 cases until now, but the war is not over yet. There are many states like Maharashtra, Delhi, Madhya Pradesh, Rajasthan, Gujarat, Uttar Pradesh, and West Bengal who are still at high risk. These states may see a significant increase in confirmed COVID-19 cases in the coming days if preventive measures are not implemented properly. On the positive side, Kerala has shown how to effectively “flatten” or even “crush the curve” of COVID-19 cases. We hope India can limit the spread and impact of COVID-19 with a strong determination in policies as already shown by the central and state governments.
There are a few other works that are based explicitly on Indian COVID-19 data. Das [
A report based on one particular model can mislead us. Here, we have considered the exponential, the logistic, and the SIS models along with the DIR. We have interpreted the results jointly from all models rather than individually. We expect the DIR to be zero or negative to conclude that COVID-19 is not spreading in a certain state. Even a small positive DIR such as 0.01 indicates that the virus is still spreading in the community and can potentially increase the DIR anytime. The states without a decreasing trend in DIR and near exponential growth in active infected cases are Maharashtra, Delhi, Gujarat, Madhya Pradesh, Andhra Pradesh, Uttar Pradesh, and West Bengal. The states with an almost decreasing trend in DIR and nonincreasing growth in active infected cases are Tamil Nadu, Rajasthan, Punjab, and Bihar. The states with a decreasing trend in DIR and decreasing growth in active infected cases in the last few days are Kerala, Haryana, Jammu and Kashmir, Karnataka, and Telangana. States with nondecreasing DIR need to do much more in terms of the preventive measures immediately to combat the COVID-19 pandemic. On the other hand, the states with decreasing DIR can maintain the same status to see the DIR become zero or negative for a consecutive 14 days to be able to declare the end of the pandemic.
Based on the modeling approaches presented in this paper, we have developed a web application [
Supplementary material.
coronavirus disease
daily infection rate
maximum value of daily infection rate over the last 2 weeks
linear combination prediction
basic reproduction number
susceptible-infectious-susceptible
World Health Organization
None declared.