This is an openaccess article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.
Recently, three randomized clinical trials on coronavirus disease (COVID19) treatments were completed: one for lopinavirritonavir and two for remdesivir. One trial reported that remdesivir was superior to placebo in shortening the time to recovery, while the other two showed no benefit of the treatment under investigation.
The aim of this paper is to, from a statistical perspective, identify several key issues in the design and analysis of three COVID19 trials and reanalyze the data from the cumulative incidence curves in the three trials using more appropriate statistical methods.
The lopinavirritonavir trial enrolled 39 additional patients due to insignificant results after the sample size reached the planned number, which led to inflation of the type I error rate. The remdesivir trial of Wang et al failed to reach the planned sample size due to a lack of eligible patients, and the bootstrap method was used to predict the quantity of clinical interest conditionally and unconditionally if the trial had continued to reach the originally planned sample size. Moreover, we used a terminal (or cure) rate model and a modelfree metric known as the restricted mean survival time or the restricted mean time to improvement (RMTI) to analyze the reconstructed data. The remdesivir trial of Beigel et al reported the median recovery time of the remdesivir and placebo groups, and the rate ratio for recovery, while both quantities depend on a particular time point representing local information. We use the restricted mean time to recovery (RMTR) as a global and robust measure for efficacy.
For the lopinavirritonavir trial, with the increase of sample size from 160 to 199, the type I error rate was inflated from 0.05 to 0.071. The difference of RMTIs between the two groups evaluated at day 28 was –1.67 days (95% CI –3.62 to 0.28;
Based on the statistical issues and lessons learned from the recent three clinical trials on COVID19 treatments, we suggest more appropriate approaches for the design and analysis of ongoing and future COVID19 trials.
The novel coronavirus disease (COVID19) has spread all over the world at an unprecedented rate since its outbreak in December 2019. More than 200 countries or territories have confirmed cases, and over 8.4 million individuals have been infected, leading to more than 45,0000 deaths as of June 18, 2020. COVID19 was declared a Public Health Emergency of International Concern by the World Health Organization (WHO) on January 30 and declared a pandemic on March 11, 2020.
As recommended by the WHO R&D Blueprint expert group, clinical improvements for patients with COVID19 can be classified in a sevencategory ordinal scale [
Not hospitalized with resumption of normal activities
Not hospitalized, but unable to resume normal activities
Hospitalized, not requiring supplemental oxygen
Hospitalized, requiring supplemental oxygen
Hospitalized, requiring nasal highflow oxygen therapy, noninvasive mechanical ventilation, or both
Hospitalized, requiring extracorporeal membrane oxygenation, invasive mechanical ventilation, or both
Death
So far, there are only eight clinical trials for COVID19 completed with results published. Among them, two trials were for hydroxychloroquine with relatively small sample sizes (30 patients for the trial of Chen et al [
The Lopinavir Trial for Suppression of Severe Acute Respiratory Syndrome Coronavirus 2 in China [
Wang et al [
Beigel et al [
So far, only one treatment, remdesivir, has been shown to be effective by a randomized clinical trial, but the other remdesivir trial failed to demonstrate its superiority over the placebo. As the pandemic of COVID19 will not be controlled anytime soon, the aforementioned three clinical trials [
The logrank test [
Let
For clinical studies with a survival end point, we are interested in the distribution of event time
Restricted mean survival time (RMST) [
Although the HR is the most popular statistic to quantify the survival difference in randomized clinical trials, it is no longer an interpretable quantity if the proportional hazards (PH) assumption is violated [
Clinical trials during the epidemic of an infectious disease might fail to reach the planned sample size due to a lack of eligible patients if the outbreak can be quickly controlled [
In the original analysis of Cao et al [
We carried out an indepth and comprehensive investigation of the trial design in Cao et al [
In terms of the primary end point, clinical improvement using twolevel increment on a sevencategory ordinal scale from baseline is ad hoc due to uneven clinical differences between adjacent scales. For example, it is ambiguous whether the status of a patient changing from point 5 to point 3 is equivalent to that of changing from point 6 to point 4. In addition, live discharge from the hospital may occur from point 3 to point 2 or point 4 to point 2, which cannot be considered equivalent either. Thus, choosing 2point improvement on the clinical outcome scale is not a precise end point, which ignores the 1point improvement and the difference between 2point and 3point improvement. Instead, we recommend death as a single and clean end point for such trials, given the mortality rate was not low with patients who were hospitalized with severe COVID19 (19.2% in the lopinavirritonavir group and 25.0% in the standard care group).
The original analysis [
The upper panel of
Moreover, the crossings of the cumulative event curves for the lopinavirritonavir and standard care groups at days 10 and 16 in the second figure of Cao et al [
Comparisons of estimates from the mixture terminal (or cure) model and the RMTI based on the reconstructed data from the second figure in Cao et al [
Terminal rate model^{b}  Lopinavirritonavir  Standard care  Difference  Hazard ratio (95% CI)  
Terminal rate, % (95% CI)  21.17 (15.7728.42)  29.91 (4.4036.66)  –8.74 (–21.04 to 3.55)  .16  1.05 (0.781.42)  .74  



Day 7  6.91 (6.797.00)  6.98 (6.947.00)  –0.07 (–0.19 to 0.05)  .26  N/A^{d}  N/A 

Day 14  12.58 (12.1113.04)  13.25 (12.9213.58)  –0.67 (–1.24 to –0.11)  .02  N/A  N/A 

Day 28  17.19 (15.7818.60)  18.86 (17.5120.21)  –1.67 (3.62 to 0.28)  .09  N/A  N/A 
^{a}Cumulative incidence curves were extracted and reconstructed from the second figure in Cao et al [
^{b}The mixture terminal rate model was performed using the “smcure” package.
^{c}The RMTI (restricted mean time to improvement) was estimated by calculating the area above the cumulative incidence curve using the “survRM2” package.
^{d}Not applicable.
The restricted mean time to improvement corresponding to the area under the curves for the lopinavirritonavir group and the standard care group evaluated at days 7, 14, and 28 in Cao et al [
Counts of deaths for the earlier stage (≤12 days after onset of symptoms) and later stage (>12 days after onset of symptoms), and survivors.
Treatment  Deaths  Survivors, n  

Earlier, n  Later, n 


Lopinavirritonavir  8  11  80  
Standard care  13  12  75 
Counts of clinical improvement cases in days 17, 814, and 1528, and nonimprovement cases.
Treatment  Clinical Improvement  No improvement, n  

Days 17, n  Days 814, n  Days 1528, n 


Lopinavirritonavir  6  39  33  22  
Standard care  2  28  40  30 
Wang et al [
Similar to the trial by Cao et al [
The upper panel of
Due to the competing risk from death, the end point might not be observed, and thus, the standard hazard concept is ambiguous, and the HR does not have a meaningful interpretation anymore [
The trial was terminated without reaching the originally planned sample size, 453, due to a lack of eligible patients. With only 236 patients in the ITT analysis, the estimated HR was 1.23 (95% CI 0.871.75), numerically favoring remdesivir, which might not be reliable due to the underpowered study. Using the bootstrap method, we can predict what would happen if the trial had continued to reach the full sample size or double the planned sample size.
Comparisons of the estimates from the mixture terminal (or cure) rate model and the RMTI based on the reconstructed data from the second figure in Wang et al [
Terminal rate model  Remdesivir  Placebo  Difference  Hazard ratio (95% CI)  
Terminal rate, % (95% CI)  0.31 (0.270.37)  0.41 (0.320.51)  –9.22 (–22.9 to 4.45)  .19  0.92 (0.631.35)  .67  



Day 7  6.95 (6.907.00)  6.97 (6.927.00)  –0.03 (–0.10 to 0.05)  .49  N/A^{b}  N/A 

Day 14  13.09 (12.7813.40)  13.29 (12.9213.67)  –0.20 (–0.69 to 0.29)  .42  N/A  N/A 

Day 28  20.42 (19.2621.57)  21.31 (19.7322.88)  –0.89 (–2.84 to 1.06)  .37  N/A  N/A 
^{a}RMTI: restricted mean time to improvement.
^{b}Not applicable.
The restricted mean time to improvement corresponding to the area under the curves for the remdesivir group and the placebo group evaluated at days 7, 14, and 28 in Wang et al [
Predicted hazard ratios (with 95% CIs) and
Sample size  Sample size in each arm  Unconditional prediction  Conditional prediction  

Remdesivir, n  Placebo, n  HR^{a} (95% CI)  HR (95% CI)  
Actual  158  78  1.23 (0.871.75)  .24  N/A^{b}  N/A 
Target  302  151  1.24 (0.961.60)  .10  1.24 (1.031.48)  .02 
Target×2  604  302  1.24 (1.031.48)  .02  1.24 (1.061.44)  .01 
^{a}HR: hazard ratio.
^{b}Not applicable.
Beigel et al [
The remdesivir trial of Beigel et al [
The upper panel of
The RMTR and percentiles of the time to recovery based on the reconstructed data from the second figure in Beigel et al [
Statistical measure  Remdesivir  Placebo  Difference (95% CI)  
RMTR^{a} (up to day 30)  14.5 (13.615.5)  17.2 (16.118.2)  –2.7 (–4.0 to –1.2)  <.001  



25th  5 (45)  6 (67)  –1 (–3 to 0)  .65 

30th  6 (56)  8 (79)  –2 (–4 to –1)  .002 

40th  8 (79)  11 (913)  –3 (–5 to –1)  .007 

50th (median)  11 (912)  15 (1319)  –4 (–9 to –2)  .01 

60th  15 (1319)  22 (2027)  –7 (–12 to –3)  .004 
^{a}RMTR: restricted mean time to recovery.
When designing and conducting a clinical trial for new treatment, particularly for the COVID19 pandemic without knowing much about the clinical outcomes, many things can go wrong if the design is not well thought out, the trial is not carefully conducted following the protocol, or the analysis is not properly carried out. Critical issues with such trials include but are not limited to the end point selection, the type I error rate control, double blinding or open label, early termination of a trial, the validity of the PH assumption in a Cox model, and assumptions for statistical tests and models. In contrast to searching for a needle in a haystack, the trial design should be more targeted, focused, and tailored for specific needs of patients with COVID19 and particular disease characteristics and severities [
Given the emergency and the fast spread of the coronavirus around the world, it is crucial to design the right clinical trial and accelerate the development of a new treatment. With the high speed of enrollment and urgency of the trial outcome, it appears to be difficult to carry out any adaptation during the trial conduct. The trial outcomes unfold so fast that any adaptation may not be able to catch up with the speed of recruitment.
As a summary, our recommendations for COVID19 trials are:
Adopt death as a single end point for patients hospitalized with severe COVID19 or live discharge from the hospital for patients with moderately severe COVID19
Conduct the gold standard trial scheme: a randomized, doubleblind, controlled trial with equal randomization; 1:2 or 1:3 allocation ratio for control vs treatment
With multiple agents tested in one trial, allow the trial to drop certain treatment due to futility or toxicity
Adopt the RMST as the metric to quantify the treatment effect when the PH assumption is not satisfied; otherwise, standard approaches using the HRs and logrank tests should be used
Control the type I error rate: Any sample size alternation during the trial must be planned and evaluated in advance with a strict control of the falsepositive rate.
ITT analysis (or its modified version) is recommended for the final analysis.
Although adaptive design has gained much popularity and is playing an increasingly important role in clinical trials, particularly in oncology, the advantages of adaptive design may be mitigated to a large extent under such a fast patient enrollment because the impact of any adaptation may be too slow to manifest before the trial is completed. In such cases, the CONSORT (Consolidated Standards of Reporting Trials) statement [
Consolidated Standards of Reporting Trials
coronavirus disease
hazard ratio
intenttotreat
proportional hazards
restricted mean survival time
restricted mean time to improvement
restricted mean time to recovery
World Health Organization
We would like to thank the referees, associate editor, and editor for their helpful comments that greatly improved the paper. The research was supported by a grant No 17307318 for GY from the Research Grants Council of Hong Kong.
None declared.