This is an openaccess article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included.
Modelling COVID19 transmission at live events and public gatherings is essential to controlling the probability of subsequent outbreaks and communicating to participants their personalized risk. Yet, despite the fastgrowing body of literature on COVID19 transmission dynamics, current risk models either neglect contextual information including vaccination rates or disease prevalence or do not attempt to quantitatively model transmission.
This paper attempted to bridge this gap by providing informative risk metrics for live public events, along with a measure of their uncertainty.
Building upon existing models, our approach ties together 3 main components: (1) reliable modelling of the number of infectious cases at the time of the event, (2) evaluation of the efficiency of preevent screening, and (3) modelling of the event’s transmission dynamics and their uncertainty using Monte Carlo simulations.
We illustrated the application of our pipeline for a concert at the Royal Albert Hall and highlighted the risk’s dependency on factors such as prevalence, mask wearing, and event duration. We demonstrate how this event held on 3 different dates (August 20, 2020; January 20, 2021; and March 20, 2021) would likely lead to transmission events that are similar to community transmission rates (0.06 vs 0.07, 2.38 vs 2.39, and 0.67 vs 0.60, respectively). However, differences between event and background transmissions substantially widened in the upper tails of the distribution of the number of infections (as denoted by their respective 99th quantiles: 1 vs 1, 19 vs 8, and 6 vs 3, respectively, for our 3 dates), further demonstrating that sole reliance on vaccination and antigen testing to gain entry would likely significantly underestimate the tail risk of the event.
Despite the unknowns surrounding COVID19 transmission, our estimation pipeline opens the discussion on contextualized risk assessment by combining the best tools at hand to assess the order of magnitude of the risk. Our model can be applied to any future event and is presented in a userfriendly RShiny interface. Finally, we discussed our model’s limitations as well as avenues for model evaluation and improvement.
More than a year after a global and unprecedented cancellation of live events in March 2020, the future of live events and the entertainment industry remains uncertain despite increasing vaccination rates and low community prevalence levels (at the time of writing). The main concern raised by these gatherings lies in their susceptibility to “superspreading”—a scenario whereby a few contagious participants inadvertently infect a disproportionately large number of others [
The answer to these questions is inherently tied to the estimation of 2 quantities: the number of infections occurring at the event and the postevent secondary attack rate, or number of subsequent infections in the participants’ social circles. Evaluating the safety (or lack thereof) of large public gatherings can then be reframed as quantifying the significance and magnitude of their effect on the distribution of the number of primary and secondary COVID19 cases. Yet, despite the growing body of literature on COVID19 risk evaluation and recent efforts to evaluate the safety of live events, this effect remains illcharacterized. Nevertheless, over the past several months, several calculators were developed to estimate this risk [
These estimators typically rank events on a scale ranging from “low” risk to “high” risk based on the feedback of medical experts [
These calculators estimate the probability of encountering 1 COVID19 case based on the number of people attending an event [
Stemming from physics or fluid dynamics, these calculators focus on modelling the aerosolization and spread of microdroplets—typically in a closed or indoor environment [
Regardless of their category, most of these models rely on a large number of input parameters, including (but not restricted to) the prevalence of the disease. While certain calculators attempt to bridge the gap between expert heuristics and physical models [
Meanwhile, with the increasing vaccination rates in several countries around the world, a few initiatives have begun to evaluate the outbreak risk associated with live events empirically [
In order to understand and illustrate the potential challenges that arise in the risk estimation for the CAPACITY study, we considered as an example a concert at the Royal Albert Hall (RAH) and demonstrate how to estimate the associated risk assuming a near capacity attendance of 5000 in the main concert hall, which has a volume of 86,650 m^{3} [
The objectives of our modelling approach were threefold: (1) enable the quantitative comparison of different activities and event characteristics, (2) estimate the efficacy of various safety protocols, and (3) provide a predictive risk assessment (ie, the risk associated with a scheduled future event). To this end, we delineated our approach into 3 sequential steps (see
Summary of our modelling pipeline.
Quantiles of the number of transmission events for the Royal Albert Hall concert, by event date, assuming that all participants were wearing masks, so that the exhalation of particles is reduced by 70% and inhalation by 50%.
Statistics  August 20, 2020  January 20, 2021  March 20, 2021  

Event  Null  Event  Null  Event  Null  
Median  0  0  1  2  0  0  
Mean  0.06  0.07  2.38  2.49  0.67  0.63  
1st percentile  0  0  0  0  0  0  
2.5th percentile  0  0  0  0  0  0  
97.5th percentile  1  1  10  7  3  3  
99th percentile  1  1  19  8  6  3 
Step 1a in our risk modelling procedure was determining the projected incidence, by predicting the number of infectious cases attending a given future event. COVID19 forecasting is undeniably an involved task, as reflected by its impressive corresponding body of literature (eg, agentbased models or susceptibleexposedinfectiousremoved models [
Projected incidence (average and 95% prediction interval) using a 100nearest neighbor approach, which provides good coverage (observed trajectory lies within the 95% prediction interval). The black line denotes observed incidence rates, while the red denotes the predicted rates, based on an initial period of observation of 14 days; the prediction interval for the predicted incidence over the next 4 weeks is highlighted in dark grey.
Step 1b was determining the underascertainment bias. The estimated number of new cases based on official incidence data will then need to be corrected for underascertainment. The latter refers to the downward bias of the reported prevalence in the population, due for instance, to limited testing capacity, low test sensitivity, or people being unwilling or unable to take a test. To this end, we compared the ratio of the number of deaths over reported cases (translated by 3 weeks) to an expected, agestratified infectionfatality ratio [
Step 1c was determining the number of infectious participants at the event. Having predicted the background daily incidence rate, we turned to the estimation of the number of infectious participants who will attend the event despite the screening protocols. For an infectious individual to attend the event in spite of the CAPACITY study’s screening protocol, they must (1) have no COVID19–like symptoms or fail to report them on the morning of the event, (2) receive a (false) negative result during antigen testing D at 2 days prior to the event, and (3) be contagious (rather than simply infected) at the time of the event. We evaluate the joint probability of these events as follows and, for the sake of clarity, refer the reader to
Regarding symptomcheck failure, one of the main challenges associated with the COVID 19 crisis is the number of asymptomatic cases—that is, infected individuals who do not express symptoms and are thus unaware of their potential infectiousness. This group includes individuals that are either presymptomatic or completely asymptomatic during the course of their illness—the latter are estimated to represent roughly 25% of all cases [
(A) Density of the COVID19 incubation time and percentage culture positive and (B) probability that an individual is infectious (light grey), that the screening protocol will miss them (black), and that they will be missed and so attend the event (red) as a function of days since infection. The shaded regions denote the uncertainty of this estimate due to the uncertainty on the sensitivity of the test.
Regarding antigen test failure, the sensitivity of COVID19 tests depends heavily on the time since infection—whether these are the goldstandard polymerase chain reaction (PCR) or lateral flow antigen assays [
where
The infectiousness of the participants—that is, the propensity of an infected ticket holder to contaminate others—is a function of time since infection. In order to estimate this relationship, we build upon the existing literature studying the link between reversetranscription PCR thresholds and cultivable virus [
Step 1d was determining the number of participants at risk. Finally, the last quantity that we needed to infer before getting into the specifics of the transmission mechanisms was the number of participants at risk of being infected who present at the event. This requires a knowledge of the participants’ COVID19 susceptibility status (ie, has the participant already had COVID19 in the previous year, or has the participant been vaccinated?) While previous history could be imputed through additional questions (eg, previous positive test for COVID19 and symptoms combined in a model such as in [
We discuss in
For the RAH example, we present a comparison of each quantity for 3 different dates (see
Comparison of the efficiency of the screening protocol and the number of infectious participants at the event by date.
Measurement  August 20, 2020  January 20, 2021  March 20, 2021^{a} 
Projected incidence (in 1,000,000)  20  1286  188 
Number of infected participants  3.6  299.3  50.2 
Number of infectious participants at the event  0.22  7.96  2.00 
Percentage of caught cases, %  94  97  96 
Number of susceptible participants  4996.4  4700.7  3860.4 
^{a}Vaccination rates started to account for a substantial proportion of the British public, so that the sum of the number of susceptible participants and the number of infected participants does not equate 5000.
Having estimated the number of infectious participants at the event, the second major component of our model consists of estimating the number of transmission events during the event itself.
More than a year after the start of the epidemic, the precise mechanisms by which COVID19 is transmitted are still unclear. Aside from direct physical contact, experts continue to debate the significance of the following 2 main routes of infection: droplet transmission and airborne transmission.
In the scenario of droplet transmission, transmission happens through the inhalation of droplets (particles of 510 µm in diameter [
Increasing concerns around airborne transmission have been raised by a number of experts over the past few months [
While droplet emission is undeniably a source of concern and a major source of transmission, simple safety precautions such as mask wearing have been shown to efficiently control this transmission source [
To estimate the uncertainty associated with this model, we used MonteCarlo simulations. We simulated random input parameters (number of infectious and susceptible individuals) using the distributions and uncertainty estimates discussed in the previous section. In order to model the uncertainty associated with the aerosol transmission model, we added a sampling step at the end of the Jimenez and Peng pipeline. This allowed us to account for individual variations in infectious participants’ ability to spread the disease and to remain consistent with the extensive literature on the heavytailed Pareto nature of COVID19 transmission and superspreading [
The code for the model can be found online on the authors’ Github [
To quantify the effect of the event, it is necessary to put it in context of the background rate of infections: Even if the participants had not been to the event, they could have been infected elsewhere. In this null model, the number of infections is binomially distributed, such that the number infections Y is Y ∼ Binom(n_{susceptible}, π).
We present the results for the RAH example in
It is likely, although not inevitable, that the event will have an impact on the transmission and increase it irrespective of the level of the prevalence. However, for low levels of prevalence and higher vaccination rates, this substantially decreases. Having computed the number of expected transmission events, we can then compute several complementary metrics of interest including, for example, the secondary attack rate (SAR)—that is, the number of COVID19 cases in the participants’ community in both the null and event models. SAR can be calculated from the predicted reproductive rate (R) in the regions where the ticket holders dwell. In the United Kingdom, R rates are updated on a weekly basis at regional levels (eg, East Midlands, London) and available from the Office for National Statistics or can be derived from the kNN modelling previously described. An opportunity for further research would be to estimate SAR within households by gathering contextual data from ticket holders. Equally, estimates of hospitalizations and deaths might be possible based on individual characteristics and comorbidities; however, this is beyond the scope of the current article.
This risk modelling pipeline also allows comparison of different protocols and situations. For example, this pipeline highlights (1) the importance of event duration (the longer the dwell time at the event, the more at risk the participants) and (2) the importance of wearing masks.
Effect of different input parameters on the quantiles of the number of infections for an event at the Royal Albert Hall across all 3 dates.
Event  August 20, 2020, median, mean (99% CI)  January 20, 2021, median, mean (99% CI)  March 20, 2021, median, mean (99% CI) 
No mask wearing, 3 hours, n=5000  0, 0.3 (04)  5, 9.9 (076)  1, 2.4 (021) 
50% mask wearing, 3 hours, n=5000  0, 0.2 (03)  3, 5.5 (040)  1, 1.3 (013) 
100% mask wearing, 3 hours, n=5000  0, 0.1 (01)  1, 2.4 (019)  0, 0.7 (06) 
100% mask wearing, 1.5 hours, n=5000  0, 0.04 (01)  0, 1.4 (010)  0, 0.4 (03) 
100% mask wearing, 3 hours, n=2500  0, 0.2 (01)  0, 0.9 (08)  0, 0.2 (03) 
Boxplots showing the distribution of the number of infections across different scenarios, for our Royal Albert Hall event held on March 20, 2021: Where variables are not mentioned, the number of attendees is 5000, the duration is 3 hours, and the proportion of attendees wearing masks is 100%.
In addition to the aggregated risk that a live event presents, individual risk of transmission can be estimated and can be communicated to ticket holders so that they can gauge whether the risk of attending the event outweighs their desire to attend. For the first person to purchase a ticket, risk of transmission will be calculated based on their own immunity status (eg, vaccination, regional prevalence) and a synthetic population based on national prevalence at that time. As more bookings are assigned to ticket holders, the reliance on the synthetic population decreases as understanding of the number of susceptible and potentially infectious individuals attending the event increases. Therefore, the confidence in the risk score increases as the event draws closer and as the proportion of tickets sold increases. This can be reflected in the updated risk scores provided to ticket holders as the event approaches. The individual risk scores can be modified based on alternative scenarios imputed into the risk algorithm. For example, for an individual not yet vaccinated, their risk could be also presented as if they had been vaccinated, offering an opportunity for the individual to appreciate how vaccination could have modified their risk. Such an approach could form the basis for behavior change interventional studies for promoting health literacy and tackling vaccine hesitancy (see
The modelling we propose is based on prevalence estimates and screening protocols to calculate the number of infectious and susceptible individuals attending the event as well as transmission dynamics at the venue to predict the number of new infections. Our paper demonstrates the value of estimating attack rates from live events so that they can be appropriately managed. We also demonstrate how individual ticket holders can receive personalized risk scores for contracting COVID19 at the event, which would, for the first time, enable genuine informed consent to be obtained. Although this methodology provides clear benefit to event organizers, local public health authorities, and individual ticket holders, our approach is based on several assumptions that group in 2 categories: modelling assumptions and parameter sensitivity.
As they combine data and tools from different sources, the computations in our pipeline rely on assumptions at 3 main levels: predicting COVID19 prevalence, assessing the efficiency of the screening protocol, and transmission at the event.
To predict future COVID19 incidence, we chose a kNN approach as it yields a more robust prediction and better uncertainty quantification than most existing parametric methods. One of the downsides of this approach is that it might not generalize very well to entirely novel behaviors or viral variants—in which case wellparameterized methods may outperform our approach as knowledge of transmission, vaccination, and other relevant model parameters continues to improve. While prevalence predictions are important for event planners and attendees alike, on the day of the event, the more important metric is whether official case rates reflect actual cases (ie, the ascertainment rate). Historically, this rate has been low due to limited testing facilities, and our method to determine ascertainment using cases, deaths, and infectionfatality rates reflects this, but also indicates that ascertainment may exceed 100% in times of widespread testing and low prevalence. It was beyond the scope of this paper to further investigate ascertainment, but we expect that future research will clarify the impact of different test types, their false negative and positive rates, and their frequency of use in determining the ascertainment rate.
Our modelling framework assumes that events will screen participants with COVID19 tests, such as virtually witnessed lateral flow antigen tests. Assessing the efficiency of this screening step requires the estimation of (1) the sensitivity of the test, (2) the probability of having symptoms, and (3) the probability of being infectious—all of these quantities being a function of days since infection. Our estimation of each of these quantities is based on published data—with the exception of the probability of symptom check failure (ie, the probability that a participant lies about their symptoms to get in). By default, we select this probability to be 50%, a choice that will be improved upon as the CAPACITY and other similar studies gather behavioral data. However, as shown in
The airborne transmission model that we use relies on a homogeneous (wellmixed) air hypothesis for an indoor environment. While several other models have been proposed (either breaking the room into compartments or using a distance index) to counter this hypothesis, we highlight (following the discussion by Jimenez and Peng [
While we try to limit the number of input parameters in our pipeline, the sensitivity of the estimates to these inputs (namely, the mask efficiency and population of interest) has to be studied. We refer the reader to
Finally, one of the main current hurdles for developing risk estimators lies in the absence of quality data to validate and benchmark different transmission models—thereby making the task of validating our transmission pipeline a rather daunting task. Indeed, while we can (and have, see
For model checking, we begin by validating the behavior of our model estimates on documented SSEs [
For model validation on (scarce) existing data, we also consider 2 documented live indoor concert events [
For prospective data gathering, finally, to overcome the lack of available data, we propose using the RShiny app [
This validation and model assessment step is further described in
A nuanced, datadriven system is required to assess risk at each event informed by the characteristics of all ticket holders and the background risk of transmission concurrent to the event, so that proportionate and specific action can be taken by event organizers and public health authorities. We have detailed our attempt to create such a system and have outlined its predictions and limitations. Our endtoend risk model is provided in the form of an RShiny interface. At times of high prevalence, this type of system will ensure events likely to increase transmission can be halted. At times of low prevalence, this will ensure events can potentially continue to operate. Learning to live with SARSCoV2 will be about implementing systems that support hyperlocal, datadriven decisions so that farreaching and highly damaging sectorspecific lockdowns can be avoided as much as possible.
Prediction.
Model and assumptions of the Jimenez aerosol transmission model.
Sensitivity analysis.
Risk communication.
Model validation.
Applied Research Collaboration
knearest neighbor
National Institute for Health Research
polymerase chain reaction
Royal Albert Hall
secondary attack rate
superspreader event
The work of TE has been supported by the Estonian Research Council Grant PRG1291. MH and AEO are supported in part by the National Institute for Health Research (NIHR) Applied Research Collaboration (ARC) Northwest London. Imperial College London is grateful for support from NIHR ARC Northwest London and Imperial NIHR Biomedical Research Centre. The views expressed in this article are those of the authors and not necessarily those of NIHR or the Department of Health and Social Care. JK is currently Director of Health Optimisation at the Center for Health and Human Performance (London, UK), as well the cofounder and Medical Director of Certific.
JK is the medical director and cofounder of Certific. None of the remaining authors have any competing interests.