The Relationship Between Population-Level SARS-CoV-2 Cycle Threshold Values and Trend of COVID-19 Infection: Longitudinal Study

Background: The distribution of population-level real-time reverse transcription-polymerase chain reaction (RT-PCR) cycle threshold (Ct) values as a proxy of viral load may be a useful indicator for predicting COVID-19 dynamics. Objective: The aim of this study was to determine the relationship between the daily trend of average Ct values and COVID-19 dynamics, calculated as the daily number of hospitalized patients with COVID-19, daily number of new positive tests, daily number of COVID-19 deaths, and number of hospitalized patients with COVID-19 by age. We further sought to determine the lag between these data series. Methods: The samples included in this study were collected from March 21, 2021, to December 1, 2021. Daily Ct values of all patients who were referred to the Molecular Diagnostic Laboratory of Iran University of Medical Sciences in Tehran, Iran, for RT-PCR tests were recorded. The daily number of positive tests and the number of hospitalized patients by age group were extracted from the COVID-19 patient information An autoregressive integrated moving average (ARIMA) model was constructed for the time series of variables. Cross-correlation analysis was then performed to determine the best lag and correlations between the average daily Ct value and other COVID-19 dynamics–related variables. the best-selected lag of Ct identified through cross-correlation was incorporated as a covariate into the autoregressive integrated moving average with exogenous variables (ARIMAX) model to calculate the Daily average Ct values showed a significant negative correlation (23-day time delay) with the daily number of newly hospitalized patients ( P =.02), 30-day time delay with the daily number of new positive tests ( P =.02), and daily number of COVID-19 deaths ( P =.02). The daily average Ct value with a 30-day delay could impact the daily number of positive tests for COVID-19 ( β =–16.87, P <.001) and the daily number of deaths from COVID-19 ( β =–1.52, P =.03). There was a significant association between Ct lag (23 days) and the number of COVID-19 hospitalizations ( β =–24.12, P =.005). Cross-correlation analysis showed significant time delays in the average Ct values and daily hospitalized patients between 18-59 years (23-day time delay, P =.02) and in patients over 60 years old (23-day time delay, P <.001). No statistically significant relation was detected in the number of daily hospitalized patients under 5 years old (9-day time delay, P =.27) and aged 5-17 years (13-day time delay, P =.39). Conclusions: It is important for surveillance of COVID-19 to find a good indicator that can predict epidemic surges in the community. Our results suggest that the average daily Ct value with a 30-day delay can predict increases in the number of positive confirmed COVID-19 cases, which may be a useful indicator for the health system.


Introduction
Coronaviruses are zoonotic pathogens that can be transmitted to humans after acquiring particular mutations [1]. SARS-CoV-2, which causes COVID-19, is mainly transmitted via airborne respiratory droplets. Although ocular secretions and oral-fecal transmission have also been indicated, these transmission methods remain uncertain [2,3].
A real-time reverse transcription-polymerase chain reaction (RT-PCR) test is used for detecting SARS-CoV-2 in respiratory samples as routine surveillance worldwide. The RT-PCR test has high sensitivity and specificity for diagnosing COVID-19 and offers faster turnaround times than the viral culture method; thus, this test has become the main method for diagnosing COVID-19. RT-PCR presents both qualitative and quantitative results with respect to the viral load [4]. The RT-PCR cycle threshold (Ct) value is identified as the number of amplification cycles needed to detect the target gene in samples [5]. The Ct value is a semiquantitative result of RT-PCR that reflects the amount of viral nucleic acids in a sample, and can thus be used as a proxy for viral load and may help decision-making in epidemic control. The Ct value has a reverse relationship with viral load so that each 3.3 increase in Ct value causes a 10-fold decrease in viral load [6]; the highest viral burden is on the first day of disease symptoms onset [7]. The positive result of COVID-19 RT-PCR tests has a lower Ct value than the recommended cutoff. In the United States, the Food and Drug Administration considers a Ct value <37 as the cutoff for a positive result of COVID-19 [8]. In more than 70% of samples with a Ct value <25, SARS-CoV-2 may be cultured, whereas only 3% of samples with a Ct value >35 can be cultured [9]. Several studies have reported that the Ct value also has an association with disease severity and mortality, and that the Ct values in patients who have more severe symptoms are low [5,[10][11][12]. In addition, hospitalized patients who died from COVID-19 had lower Ct values [13]. A systematic review showed a significant correlation between Ct value and disease severity in hospitalized patients but not in nonhospitalized COVID-19 patients [5]. There is controversy among studies on the use of Ct values at an individual level for the prognosis of the disease or treatment planning. The Ct value may vary due to the collection method among laboratories [14] or the target gene selected for RT-PCR [15]. Moreover, the RT-PCR test can detect any viral material and does not distinguish between live viruses and viral debris, which may persist for a long time beyond the point of infectiousness [12].
To the best of our knowledge, few studies have examined the use of population-level Ct values as a measure of COVID-19 dynamics in communities. As Ct values have a significant relationship with disease severity and infectivity, a higher average Ct value in daily testing samples from a population may predict epidemic growth in a community. Hay et al [16] analyzed simulation and surveillance data and found that decreases in the proportion of Ct values in a population may cause a local increase in transmission or a new number of patients [16]. In addition, the median Ct value may be an effective measure for forecasting a pandemic surge.
To resolve these issues, the aims of this study were to determine the relationships between the daily trend of average Ct value and COVID-19 dynamics, including the daily number of hospitalized patients with COVID-19, daily number of new positive tests, daily number of COVID-19 deaths, and number of hospitalized patients with COVID-19 by age. We further aimed to determine the lag between these series.

Samples and RT-PCR
The samples included in this study were collected from March 21, 2021, to December 1, 2021. Inclusion criteria were samples obtained from individuals suspected of having COVID-19 and were referred to a laboratory in Tehran, Iran, to confirm the diagnosis. Daily results of Ct values of all patients referred to the laboratory for RT-PCR tests were recorded. The daily number of positive cases and the number of hospitalized people by age group for 9 months were extracted from the COVID-19 patient information registration system in Tehran province, Iran.
This study included samples of the upper respiratory tract (both nasopharyngeal and anterior nares swab samples) taken using a sterile Dacron thin swab with a plastic or aluminum handle as the main test specimen. The samples were collected by a physician, nurse, laboratory expert, and other staff with sufficient training and experience. All biological samples were sent to the Molecular Diagnostic Laboratory of Iran University of Medical Sciences in Tehran, Iran. All samples were analyzed using the Pishtazteb One-step RT-PCR COVID-19 Kit (dual-target gene diagnosis), and RNA extraction was performed using a Zybio nucleic acid extraction kit (magnetic bead method). To confirm the diagnosis, the target genes were the SARS-CoV-2 nucleocapsid gene and RdRp gene [17]. For each sample, the Ct value was recorded. The samples that produced a positive result in the RT-PCR test and had a Ct value ≤37 were recorded to determine the daily average Ct values.

Overview
The daily median Ct value among all patients referred to the laboratory and the daily number of hospitalized patients with COVID-19 by age group were plotted over time. The autoregressive integrated moving average (ARIMA) and autoregressive integrated moving average with exogenous variables (ARIMAX) models were used to determine significant associations between the daily average Ct value and the daily number of COVID-19 hospitalizations by age, daily number of COVID-19 deaths, and daily number of positive tests in Tehran province, Iran.

ARIMA Model
Time-series analyses are appropriate when dealing with a set of data that has a time trend [18]. The Box-Jenkins time-series approach, especially the ARIMA model, is one of the best methods in time-series analysis of autocorrelated data [19], such as the daily average Ct value. In autoregressive models, the outcome (Y t ) is a linear function of the previous values and a random component. Nonseasonal ARIMA model parameters are (p, d, q) overall, where p is the order of autoregression (AR), d is the degree of trend difference, and q is the order of moving average (MA). To perform time-series analysis, it is first necessary to check the stability of the mean and variance. For this purpose, the augmented Dickey-Fuller (ADF) test is used [20] for checking the stability of the mean and the Box-Cox test is used to check the stability of the variance. Logarithm transformation and differentiation were used to establish stability in the variance and mean, respectively. The first-time differences can be expressed as: Where Y t represents nonstationary time-series data and Y′ t is the time series after the first-time differences. If the time series has a seasonal trend, seasonal differences are used to stabilize the series. The AR parameter p represents the linear correlation of the current value of the time series Y t with the previous values Y t-1 , Y t-2 ,... and current residuals ε t [21]. The MA parameter q shows the linear correlation of the current value of the time series Y t with the current and previous residuals of the time series ε t , ε t-1 ,… [22]. The general formula of AR (p) and MA (q) models are represented in equations (2) and (3), respectively: where C is a constant; β 1 , β 2 ,…, β p are AR model terms; and ϕ 1 , ϕ 2 ,…, ϕ q are MA model terms. The number of AR and MA parameters was determined by the autocorrelation function and partial autocorrelation function.
The general form of the ARIMA model can be written as: Four main steps for the development of the ARIMA model include checking mean and variance stability (see Table S1 in Multimedia Appendix 1), and identifying p and q terms (see Figure S1 in Multimedia Appendix 1).

Model Parameter Estimation
The maximum-likelihood approach was used for the model parameters. To determine the best ARIMA model, among the models that passed the residual test (normality and stability in the variance), the model with the lowest Bayesian information criterion (BIC) and Akaike information criterion (AIC) was selected as the final model. The BIC and AIC formulae are represented as follows: Where m is the number of observations, k is the total number of parameters in the model, and ln(L) is the likelihood function.
The ARIMA model was developed to the time series of the daily average Ct value, daily number of hospitalized patients with COVID-19, new number of daily positive tests, daily number of COVID-19 deaths, and number of hospitalized patients with COVID-19. The detailed method for derivation of the ARIMA model is described in Multimedia Appendix 1.

Cross-correlation Function
To evaluate the time delay between the daily average Ct value and the daily number of hospitalized patients with COVID-19, daily number of new positive tests, daily number of COVID-19 deaths, and number of hospitalized patients with COVID-19 by age, the cross-correlation function was used. The independent (daily average Ct value) and dependent variables (daily number of hospitalized patients with COVID-19, new number of daily positive tests, daily number of COVID-19 deaths, and number of hospitalized patients with COVID-19 by age) were preprocessed by the previously fit ARIMA models. The cross-correlation coefficient is mathematically represented as follows: where Cαβ(k) is the value of covariance between the preprocessed input time series and preprocessed output time series at the lag k, is the value of the standard deviation of the preprocessing input time series, and is the value of the standard deviation of the preprocessing output time series [23]. Three indicators, Schwarz Bayesian information criterion (SBIC), Hannan-Quinn information criterion (HQIC), and AIC, were used to select the best lag. SBIC=log(n)k-2 log(L(θ̂)) (8) HQIC=-2ln(L(θ̂)) +2klog(logn) (9) In equations (8) and (9), n is the sample size, k is the number of estimated parameters, θ is the set of all parameter values, and L(θ̂) is the likelihood of the model.

ARIMAX Model
The ARIMAX model is an expansion of the ARIMA model by adding an explanatory independent variable. The ARIMAX model is the combination of multiple regression analysis and time-series analysis; therefore, it can determine the impact factor of the relationship between different lags of Ct values and other study variables. The ARIMAX model formula is as follows: where x(t) is an independent variable at time t and β is its associated coefficient. Y t-1) …Y t−p is the previous value of a dependent variable, and ε t …ε t-q is the residual of the time series.
To determine the association and coefficient of the association between the lags of the x t+m time series and series Y t , the ARIMAX model was used. The cross-correlation function was used to find the linear correlation between x t+m and Y t for different lags, which can help to find the best lags of the independent variable that might be used to predict the dependent variable [24]. The lags of Ct values that were selected through the correlation function were incorporated as covariates into the ARIMAX model with other dependent variables such as the daily number of hospitalized patients with COVID-19, number of new daily positive cases, daily number of COVID-19 deaths, and number of hospitalized patients with COVID-19 by age. The maximum-likelihood method was used for estimation of the parameters. The Ljung-Box Q test was applied to evaluate white noise for the residual series. Data were analyzed by Stata software version 14. Figure 1 shows the steps of building the best ARIMAX model.

Ethics Considerations
Since individual data were not used in this study, no formal ethical assessment or informed consent was required. This study was approved by the Ethics Committee of Iran University of Medical Sciences (ethical code: IR.IUMS.REC.1400.799).      Table 2 shows the best ARIMA models for the study variables. The ARIMA (1,0,1) model was the best model for the daily average Ct value in comparison with other models, having the lowest BIC value, daily number of the hospitalized patients, and daily count of positive COVID-19 tests. The ARIMA (1,0,2) model was the best model for the daily number of COVID-19 deaths. All models had the lowest number of significant estimated parameters, and the residual analysis showed a good fit (normality and stability in the variance) for the selected ARIMA models using the AIC. There was no seasonal pattern in the study variables. The ADF test was used for evaluating stability in the mean and the Box-Cox test was used to test the time-series stability in the variance. The time series of the daily number of hospitalized patients by age did not show stability for the variance, and therefore log transformation was applied to this variable.   Figure 3 shows the cross-correlations between the study variables and Ct value. In this figure, negative lags would not be considered because the negative lag indicates that the study variables could affect the average Ct value in a certain period at a later point in time; therefore, the positive lag was used to show the effect of the Ct value on the study variables in the future. A cross-correlation function was performed between the preprocessed input and output series. Table 3 shows the best lag difference between the Ct value and the study variables. Indicators such as AIC, SBIC, and HQIC were used to examine the selected lag. There was no statistically significant (all P>.05) lag (time delays) between the average Ct value and the daily number of hospitalized patients under 5 years old and the number of hospitalized patients aged 5-17 years. However, a significant 23-day lag was found between the average Ct value and number of hospitalized patients. The daily count of positive COVID-19 tests as well as the daily number of COVID-19 deaths had a significant 30-day lag with the average Ct value.

Impact of the Ct Value on Study Variables (ARIMAX Model)
After obtaining the best lag between the daily Ct value and other variables using cross-correlation analysis (Table 3), ARIMAX was used to calculate the impact coefficients of the selected lags. Table 4 shows that a Ct value with a 30-day delay could affect the daily number of positive COVID-19 tests and the daily number of deaths from COVID-19. Specifically, a decrease in Ct value may cause an increase of approximately 16.87 times in the average number of new positive tests for COVID-19 after 30 days. In addition, the daily number of deaths from COVID-19 will increase by approximately 1.52 times after 30 days with a decrease in the Ct value. There was a significant coefficient between Ct lag (23 days) and the number of COVID-19 hospitalizations. There was also a significant association of the Ct value with a 23-day delay and the number of COVID-19 hospitalizations for patients aged 18-59 years and patients aged more than 60 years.

Principal Findings
The Ct value is a good proxy for viral load, which can offer the possibility of isolating people who have a higher viral load (lower Ct value) and those who have been in contact with these people for the past 5 days to reduce the transmission rate [11]. Therefore, the Ct value can be a good indicator for predicting the state of the disease process in the future. This study investigated the relationship between the population distribution of Ct values obtained from SARS-CoV-2-positive RT-PCR tests and COVID-19 dynamics. The results showed that the daily average Ct value has a significant negative relationship with three study variables of COVID-19 dynamics: daily number of hospitalized patients, daily count of positive COVID-19 tests, and daily COVID-19 deaths. The Ct value can predict the peak of the epidemic curve of the number of new positive COVID-19 patients with an interval of 30 days earlier.

Comparison With Prior Work
This result is consistent with the results of a study by Walker et al [21] showing that a declining population-level Ct value preceded increases in SARS-CoV-2 positivity tests. Another study showed a negative association between individual Ct values and severity of symptoms of COVID-19 [25]. A few studies have focused on the effect of the population-level Ct value as an indicator for predicting pandemic surges. Consistent with this study, Tso et al [26] showed that daily median Ct values have a negative correlation with the daily count of positive tests, daily transmission rates, and daily number of COVID-19 hospitalizations in the greater El Paso area; they also showed a significant 33-day time delay between daily median Ct values and the daily number of COVID-19 hospitalizations. In this study, we found a significant 23-day time delay between the daily average Ct value and the number of hospitalized COVID-19 patients aged 18-59 years and aged more than 60 years. The former age group represents the major workforce, and are thus more likely to be exposed and become infected with the SARS-CoV-2 virus. Buchan et al [27] showed that the average Ct values were statistically similar among age groups, but patients in the age group of 80-89 years had slightly lower Ct values. According to an epidemiology study in Iran, the majority of hospitalized COVID-19 patients were in the age group of 50-60 years [28]. The relationship between the daily average Ct value and the number of COVID-19 patients aged under 5 years was not significant in this study.
Hay et al [16] estimated the epidemic trajectory in Massachusetts, United States, using a mathematical model for population-level Ct values, and also found that an increasing epidemic wave will be accompanied by a high frequency of recently infected patients with high viral loads (lower Ct values), whereas a declining epidemic wave occurs when the number of patients with older infections is high. Therefore, Ct values obtained from the disease care system during the epidemic of SARS-CoV-2 can determine the course of the epidemic process at short intervals [16]. In this study, the ARIMAX model was used to find the effect of Ct value delay time on the number of positive COVID-19 tests, and a 30-day delay was found between the average population-level Ct value and the number of positive COVID-19 cases.

Limitations
Differences in how measurements of Ct value or assurance about the quality of the data sets that are used to measure population-level Ct values in different geographical areas may affect the power of the Ct value for predicting local COVID-19 epidemic waves. Previous studies have indicated that changes in the population-level Ct values of surveillance samples may lead to a disease outbreak [16,29]. There is a hypothesis that if only patients with clinical symptoms who had positive tests were used to calculate the daily average Ct value, the association between the daily Ct value and COVID-19 cases would be more readily detected; thus, a decrease in Ct values may be more closely associated with the increasing number of COVID-19 patients. To investigate this hypothesis, only the Ct value of patients with symptoms was used to calculate the daily average Ct value in this study.

Conclusions
The daily average population-level Ct value has a relationship with the number of positive SARS-CoV-2 tests and time delay. Thirty days after reducing the daily average Ct value, the number of new COVID-19 cases is expected to increase. It is important to find a good indicator that can predict epidemic surges in the community for improved COVID-19 surveillance. Faster prediction of a new wave of disease will help health policymakers to initiate appropriate public health policies such as lockdowns for decreasing an anticipated pandemic surge, and will provide health systems an opportunity to meet the needs of medicine and facilities to support additional patients.

Conflicts of Interest
None declared.