Forecasting COVID-19 Hospital Census: A Multivariate Time-series Model Based on Local Infection Incidence

Background COVID-19 has been one of the most serious global health crises in world history. During the pandemic, health care systems require accurate forecasts for key resources to guide preparation for patient surges. Forecasting the COVID-19 hospital census is among the most important planning decisions to ensure adequate staffing, number of beds, intensive care units, and vital equipment. Objective The goal of this study was to explore the potential utility of local COVID-19 infection incidence data in developing a forecasting model for the COVID-19 hospital census. Methods The study data comprised aggregated daily COVID-19 hospital census data across 11 Atrium Health hospitals plus a virtual hospital in the greater Charlotte metropolitan area of North Carolina, as well as the total daily infection incidence across the same region during the May 15 to December 5, 2020, period. Cross-correlations between hospital census and local infection incidence lagging up to 21 days were computed. A multivariate time-series framework, called the vector error correction model (VECM), was used to simultaneously incorporate both time series and account for their possible long-run relationship. Hypothesis tests and model diagnostics were performed to test for the long-run relationship and examine model goodness of fit. The 7-days-ahead forecast performance was measured by mean absolute percentage error (MAPE), with time-series cross-validation. The forecast performance was also compared with an autoregressive integrated moving average (ARIMA) model in the same cross-validation time frame. Based on different scenarios of the pandemic, the fitted model was leveraged to produce 60-days-ahead forecasts. Results The cross-correlations were uniformly high, falling between 0.7 and 0.8. There was sufficient evidence that the two time series have a stable long-run relationship at the .01 significance level. The model had very good fit to the data. The out-of-sample MAPE had a median of 5.9% and a 95th percentile of 13.4%. In comparison, the MAPE of the ARIMA had a median of 6.6% and a 95th percentile of 14.3%. Scenario-based 60-days-ahead forecasts exhibited concave trajectories with peaks lagging 2 to 3 weeks later than the peak infection incidence. In the worst-case scenario, the COVID-19 hospital census can reach a peak over 3 times greater than the peak observed during the second wave. Conclusions When used in the VECM framework, the local COVID-19 infection incidence can be an effective leading indicator to predict the COVID-19 hospital census. The VECM model had a very good 7-days-ahead forecast performance and outperformed the traditional ARIMA model. Leveraging the relationship between the two time series, the model can produce realistic 60-days-ahead scenario-based projections, which can inform health care systems about the peak timing and volume of the hospital census for long-term planning purposes.


Contact Info
Acknowledgements where is a x 1 vector of the differenced series − −1 , = −( − 1 − ⋯ − ), and = −( +1 + ⋯ + ) (for = 1, . . . , − 1). The model has the following assumptions: • Assumption 1: The components of are at most (1), i.e., integrated of order 1. • Assumption 2: 0 ≤ = ≤ • Assumption 3: are identically and independently distributed ( , ) random vectors with covariance matrix . For assumption 2, if = , then it can be shown that the VECM becomes a standard VAR model. If = 0, then is the zero matrix and there is no cointegration relationship between the series. The VECM then becomes a VAR model for differenced time-series. If 0 < < , then can be factored into = , where and are both x matrices. From assumption 1, the differenced series , and its lags −1 , . . . , − +1 , are stationary. It follows that −1 = −1 , also called the error correction term, is (trend-)stationary, depending on the specification of the deterministic components. The linearly independent columns of are the cointegrating vectors and the rank is equal to the cointegration rank of the system of time series.

Forecast performance
We used Mean Absolute Percentage Error (MAPE) to evaluate the 7-day-ahead forecasts of Census: where is the forecast value and is the actual value.
The sampling distribution of out-of-sample MAPE is obtained by time-series cross-validation.

Long-range scenario-based forecasting
Leveraging epidemiologically informed scenarios of the future infection incidence, we attempted to use the model to create realistic projections of hospital census. On January 9, 2021, we expected the winter surge to reach peak infection prevalence around February 5, 2021 based on an extension of an epidemiological model called the Susceptible-Infected-Removed model [2]. We linearly extrapolated Incidence with positive trend up to the expected pandemic peak. The severity of a scenario was controlled by a trend-dampening parameter [3]. After the peak, the descent path was initially symmetric to its ascent and then eventually became linear ( Figure 2

Background
Our work is motivated by the need from hospital leaders to have timely and accurate forecasts to guide planning for surges in hospital census, i.e., bed capacity, due to the COVID-19 pandemic. Adequate preparation can help prevent or mitigate strains on hospital resources COVID-19 that result when hospitals exceed their historical capacity.

Objective
We want to explore whether the local COVID-19 infection incidence and the COVID-19 hospital census can be successfully incorporated within a multivariate time-series model to delivery satisfactory 7-day-ahead forecast performance and examine the application of this model to scenario-based longterm forecasting.

Study data
The study data are aggregated daily COVID-19 hospital census across 11 Atrium Health hospitals plus a virtual hospital in the greater Charlotte metropolitan area of North Carolina, as well as the total daily infection incidence across the same region during the May 15, 2020 -December 5, 2020 period ( Figure 1). The data was applied to appropriate transformations to linearize their relationship.

Model estimation •
The level equation requires 7 lags (p = 7) to capture all temporal dependencies. • Strong evidence for a cointegration relationship (P<.01): where ect −1 was the (lagged) error correction term.
• Long-run effect: the error correction term had a negative and a statistically significant effect on Census change (P<.01). • Short-run effects: past Incidence changes at lags 1, 2, 4, 5, and 6, as well as past Census change at lag 2, had significant effects on Census change.

Forecast performance
The typical value (median) of MAPE was 5.9% and the 95th percentile of MAPE was 13.4% (Figure 3). For the sake of comparison, the corresponding values from an Autoregressive Integrated Moving Average (ARIMA) model using the COVID-19 hospital census only were 6.6% and 14.3%.

Model
A Vector Error Correction model (VECM) is a vector autoregressive (VAR) model used for nonstationary multivariate time-series and accounts for stable long-run relationships, i.e., cointegration, between the time series. A time-series vector is said to be cointegrated if there is at least one linear combination of the vector that is trend-stationary.
Following [1], we first describe the VAR representation of the model, i.e., the level equation: = 1 −1 + ⋯ + − + + + for time = 1, . . . , , where (for = 1, . . . , ) are x coefficient matrices of the lagged series at lag , is a x 1 vector of constants, is a 6 x 1 vector of weekly seasonal indicators, is a x 6 coefficient matrix for seasonal indicators, and is a x 1 vector of random errors.
The VECM representation, i.e., difference equation, can be derived from above:

Long-range scenario-based forecasting
In all scenarios, due to cointegration, the hospital census followed corresponding concave trajectories with peaks occurring approximately 2-3 weeks later than Incidence depending on the scenario. In the worst-case scenario, the hospital census was projected to peak on February 16, 2021 (11 days later than Incidence) with approximately 850 patients at the 80% forecast interval upper bound ( Figure 4).  In hindsight, by evaluating different scenarios of peak resource demand against our resource capacity, we have correctly assured our leaders of our capability to handle even the worst-case scenario, alleviated uncertainty, and effectively guided long-term planning of adequate staffing, bed capacity, and equipment supplies through the pandemic.
This research protocol was submitted to the Atrium Health Institutional Review Board (IRB) prior to execution and the study was deemed exempt from IRB oversight. In compliance with HIPAA regulations, individual patient information is not disclosed, all data have been deidentified and reported as aggregates.