Published on in Vol 6, No 4 (2020): Oct-Dec

Preprints (earlier versions) of this paper are available at, first published .
COVID-19 Surveillance in a Primary Care Sentinel Network: In-Pandemic Development of an Application Ontology

COVID-19 Surveillance in a Primary Care Sentinel Network: In-Pandemic Development of an Application Ontology

COVID-19 Surveillance in a Primary Care Sentinel Network: In-Pandemic Development of an Application Ontology

Original Paper

1Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, United Kingdom

2General Practice and Primary Care, Institute of Health and Wellbeing, University of Glasgow, Glasgow, United Kingdom

3University Children's Hospital Basel, University of Basel, Basel, Switzerland

4PRIMIS, University of Nottingham, Nottingham, United Kingdom

5Department of General Practice, Royal College of Surgeons, Ireland, Dublin, Ireland

6Department of Veterinary Medicine and Animal Productions, University of Naples Federico II, Naples, Italy

Corresponding Author:

Simon de Lusignan, MD

Nuffield Department of Primary Care Health Sciences

University of Oxford

Radcliffe Primary Care Building

Radcliffe Observatory Quarter, Woodstock Rd

Oxford, OX2 6GG

United Kingdom

Phone: 44 1865617283


Background: Creating an ontology for COVID-19 surveillance should help ensure transparency and consistency. Ontologies formalize conceptualizations at either the domain or application level. Application ontologies cross domains and are specified through testable use cases. Our use case was an extension of the role of the Oxford Royal College of General Practitioners (RCGP) Research and Surveillance Centre (RSC) to monitor the current pandemic and become an in-pandemic research platform.

Objective: This study aimed to develop an application ontology for COVID-19 that can be deployed across the various use-case domains of the RCGP RSC research and surveillance activities.

Methods: We described our domain-specific use case. The actor was the RCGP RSC sentinel network, the system was the course of the COVID-19 pandemic, and the outcomes were the spread and effect of mitigation measures. We used our established 3-step method to develop the ontology, separating ontological concept development from code mapping and data extract validation. We developed a coding system–independent COVID-19 case identification algorithm. As there were no gold-standard pandemic surveillance ontologies, we conducted a rapid Delphi consensus exercise through the International Medical Informatics Association Primary Health Care Informatics working group and extended networks.

Results: Our use-case domains included primary care, public health, virology, clinical research, and clinical informatics. Our ontology supported (1) case identification, microbiological sampling, and health outcomes at an individual practice and at the national level; (2) feedback through a dashboard; (3) a national observatory; (4) regular updates for Public Health England; and (5) transformation of a sentinel network into a trial platform. We have identified a total of 19,115 people with a definite COVID-19 status, 5226 probable cases, and 74,293 people with possible COVID-19, within the RCGP RSC network (N=5,370,225).

Conclusions: The underpinning structure of our ontological approach has coped with multiple clinical coding challenges. At a time when there is uncertainty about international comparisons, clarity about the basis on which case definitions and outcomes are made from routine data is essential.

JMIR Public Health Surveill 2020;6(4):e21434



The COVID-19 pandemic has many features of a complex system [1,2]. Complexities include repeated name changes of both the causative organism and associated disease [3-6], evolving understanding of core clinical features at presentation [7,8], and differing rates of testing and approaches to outcome reporting between countries [9,10]. This complexity presents a significant challenge for consistent clinical coding within computerized medical records (CMR) systems [11].

Creating an ontology for COVID-19 surveillance should help to facilitate reproducibility and interoperability between various key stakeholders, from clinicians and epidemiologists to data scientists and software developers. Ontologies are formalizations of conceptualizations and exist in reference or application formats [12]. Reference ontologies are at a domain level and describe a group of related concepts. Application ontologies are more specific and are used when modeling across multiple domains [13]. Application ontologies should be evaluated against a testable use case, which represents the scope and requirements of the specific application [14,15]. The emergence of a new disease means that corresponding original ontologies need to be developed.

We report the development of a COVID-19 application ontology using the Oxford Royal College of General Practitioners (RCGP) Research and Surveillance Centre (RSC) network’s adaptations to COVID-19 as its use case. The RCGP RSC is an established primary care sentinel network, which extracts pseudonymized data from a nationally representative sample of over 500 general practices twice weekly (N=5,370,225) [16]. RCGP RSC has collaborated with Public Health England (PHE) for over 50 years, conducting influenza and respiratory disease surveillance and vaccine effectiveness studies [17,18]. The RCGP RSC has extended these routine surveillance activities to include monitoring the spread of COVID-19, assessing the effectiveness of containment measures, and becoming a platform for an in-pandemic COVID-19 trial [19,20].

Building on our previous experience of developing ontologies [12,21,22], we created an application ontology for extended COVID-19 surveillance.


Our application ontology was developed in 3 stages—stage 1: creating and testing our use case; stage 2: developing the COVID-19 surveillance ontology; and stage 3: external validation using a rapid Delphi consensus exercise. We classified this as an application ontology because the system crosses a range of domains to deliver specific goals. The data source was routine primary care data from the RCGP RSC sentinel network, combined with virology and serology sampling data from PHE. Additionally, a practice dashboard and a practice liaison team helped ensure data quality [23]. Episode type, whether a case is a first or incident case or a follow-up is important for surveillance; while we can infer this from records it is better if it is collected as primary information [24].

Stage 1: Creating and Testing the Use Case

We created a testable narrative use case for COVID-19 surveillance, using previously described methods [25,26]. The primary actor was the RCGP RSC, the system it interacts with was the national response to the COVID-19 pandemic, and its outcomes entailed monitoring spread and effect of mitigation measures.

The use case has been progressively implemented in-pandemic. We report our implementation across the domains identified. As our ontology developed and formalized, post hoc checking was done to ensure extracts were ontology compliant.

Stage 2: Developing the COVID-19 Surveillance Ontology

We developed an application ontology to support extended surveillance using routine CMR data. The terminology and clinical understanding and response to COVID-19 were rapidly changing during the development period. The ontology has built-in flexibility to accommodate these and further changes.

We used our 3-step ontological process to identify codes to meet our requirements [12,21,22]:

  • Step 1: the ontology layer; defines relevant COVID-19 surveillance concepts and may include exposure, investigations, diagnoses, or other “processes of care.” Part of our ontological process is to iterate whether cases identified are definite, probable, or possible, based on the specificity of the codes used (an approach developed in diabetes research [27]);
  • Step 2: the coding layer; applies concepts of the ontology layer to the specific coding system used in the CMR. Individual codes are classified as having direct, partial, or no clear mapping to the criteria considered [28]. In this case, we extended this to exclude suspected cases where there was a subsequent negative test. Post hoc data validation was done largely via our practice liaison team. One member of the team was entirely dedicated to ensuring data quality providing anticipatory training and coding aids, and a responsive service;
  • Step 3: the logical data extract model; systematically tests the codes identified to ensure that data outputs are consistent with study requirements.

We wanted our resource to be findable, accessible, interoperable, and reusable (FAIR) [29], so we used standard tools in its development, namely the Protégé ontology development environment [30] and Web Ontology Language (OWL) [31].

The scope of the ontology included:

  • Demographic details, including age, gender, ethnicity, deprivation, rurality, and linking key identifiers;
  • Recording of monitored conditions and key clinical features (ie, symptoms and signs);
  • Relevant comorbidities and risk factors;
  • Tests and test results (ie, COVID-19–specific and test results that might imply susceptibility or resilience);
  • Key outcome measures including hospitalization, oxygen therapy, intensive care admission, and mortality.

Stage 3: External Evaluation of the Ontology

We carried out a rapid Delphi consensus exercise by inviting a panel (n=9) of international primary care clinicians and informaticians through the International Medical Informatics Association Primary Health Care Informatics working group and extended networks [32,33]. The consensus exercise consisted of 3 rounds:

  1. We shared our initial ontology and requested panel members to inform us about additional concepts that were not present in the ontology but present in their clinical workflows. In order to facilitate rapid consensus, we used email correspondence for this stage.
  2. We shared the revised ontology with panel members, who were asked to indicate their level of agreement, on a 5-point Likert scale, to statements related to the coverage of concepts and applicability of the ontology to their primary care system. This was delivered through on an online survey (see panel members and questions in Multimedia Appendix 1). Consensus was defined as ≥80% agreement. Statements not meeting 80% agreement were modified according to the feedback provided by the expert panel and redistributed to panelists for round 3.
  3. We conducted an online discussion to review and approve the final ontology.

Ethical Considerations

COVID-19 surveillance is carried out by RCGP RSC in collaboration with PHE, and approved under Regulation 3 of The Health Service (Control of Patient Information) Regulations 2002 by PHE’s Caldicott Guardian [34]. No specific permissions were needed for our ontology development as no additional processing of data was required.

Stage 1: Creating and Testing the Use Case

We developed a summary narrative use case (Table 1). The success scenarios listed are goals we want to achieve.

The success scenarios and extensions reflect the cross-domain activities within the use case. We list the outcomes across 5 domains: primary care, public health, virology, clinical research, and clinical informatics (Table 2). We implemented our ontology through practical activities across these domains.

Table 1. Summary narrative use case.
  • Oxford Royal College of General Practitioners Research and Surveillance Centre
  • Delivery of COVID-19 surveillance and research
  • Health care system wide
Stakeholders and interests

Patients and public
  • Safe and timely guidance through the pandemic

General practices
  • Professional interest; payment; providing high-quality, evidence-based care

Public Health England
  • Need data to predict transmission
  • Monitor the effectiveness of interventions

Royal College of General Practitioners
  • Care for/protect members
  • Contribute to pandemic response

Primary care clinical trials unit
  • Data governance policies control which data can be viewed
  • Recruit to trial to mitigate COVID-19
  • Legal basis, permissions for data extracts, data extraction, and analytics capability within the network
Minimal guarantee
  • Delivery of data and analytics at prepandemic scale
Success guarantee
  • Larger network with high-quality data
  • Outputs to meet changed requirements during the pandemic
  • Authoritative source of primary care data, evidenced by academic publication
Main success scenario
  • High-quality primary care data, feedback to practices via customized dashboards
  • Representative sampling of virology and serology by collecting the specified number of samples (900 virology, 1000 serology per week)
  • Twice weekly data feeds to Public Health England to meet their data requirements
  • National observatories and weekly return that represent the impact of COVID-19
  • Ensure that we fully recruit to the PRINCIPLEa and other trials through the Oxford–RCGP RSC system
  • High-quality publication of lessons from surveillance
  • Trebling the number of virology practices (we have gone from 100 to 300 virology sampling practices, from 10 to 200 serology sampling practices)
  • Adjusting to the effect of lockdowns on:
    • Extending the network to over 1000 practices to support large-scale clinical trials, embedded in clinical practice; eg, recruitment into the PRINCIPLE trial
    • Sampling all eligible patients due to the reduced number seen on surgery premises
  • Postconvalescent serology; we will collect convalescent serology at 28 days from a wide range of practices
  • Managing unforeseen problems:
    • Refusal of some post offices to allow sample postage
    • Postage delays
    • Swab supply problems
  • Piloting new methods of swab delivery to patients
  • Add resilience to the surveillance system
    • Human resilience - extending data team and support
    • System resilience - direct feeds from major CMRb suppliers
  • Other studies: large numbers of study requests that need managing

aPRINCIPLE: Platform Randomised Trial of Interventions Against COVID-19 in Older People.

bCMR: computerized medical record.

Table 2. Application use-case outcomes by domain.
Primary care
  • COVID-19 Observatory - temporal and geographic surveillance
  • COVID-19 dashboard - practice-level data quality
  • Data quality feedback to practices
  • Feedback from practices
Public health
  • COVID-19 - supplementary report
  • Public health policy - containment measures

  • Trends of community transmission after social distancing ends
  • Estimates of COVID-19–related community morbidity and mortality
  • Swabbing - investigation
    • Virology
    • Serology

  • Virologically confirmed incidence
  • Representative collection of serology for sero-epidemiology
  • Ordering stock control and swab and virology container supply
Clinical research
  • Recruitment to clinical trials
  • Health outcomes: chest infections, hospitalization, intensive care unit, mechanical ventilation, oxygen therapy, and death
Clinical informatics
  • IGa—legal basis, data sharing agreements, contracts
  • Hardware and its resilience
  • Semantic interoperability across domains
  • Data quality, usability, FAQsb—continuous improvement of our interface
  • Adaptability with changing clinical knowledge
  • Ontology with annotations to clinical terms/codes

aIG: information governance.

bFAQ: frequently asked question.

Primary Care Domain: Data Quality and Feedback to General Practices (COVID-19 Dashboard)

Our COVID-19 dashboard presented weekly data on respiratory conditions to practices within the RCGP RSC sentinel network. Data were presented on COVID-19 incidence for the individual practice, and at the regional and national levels for reference, along with rates of other respiratory infections (Figure 1). Postimplementation feedback had to keep pace with multiple data changes and different timetables of code releases between CMR system providers. It included constant updating of coding prompt cards [35]. It also required liaison with computer template developers to change design to incorporate episode type.

Figure 1. COVID-19 dashboard for each RCGP RSC network practice [36]. The column starting P35398 is that practice data; “South” is their region; RSC is the rate across the whole network. Dates are presented in the DD/MM/YYYY format throughout. RCGP RSC: Oxford Royal College of General Practitioners (RCGP) Research and Surveillance Centre; URTI: upper respiratory infection; LRTI: lower respiratory infection.
View this figure
Public Health Domain: Data Visualization With COVID-19 Observatory

Our ontology ensured consistency between our classic weekly return, which now includes COVID-19 surveillance. In addition, we developed customized outputs for epidemiologists at PHE and an observatory to present data on the incidence of COVID-19 across the network (Figure 2). This is based on coding described in the ontological layer and presents incidence rate per 10,000 cases of COVID-19. Up to the week commencing September 21, 2020, we have identified a total of 19,115 people with definite COVID-19, 5226 probable cases, and 74,293 people with possible COVID-19 within the RCGP RSC network (N=5,370,225).

Figure 2. Oxford RCGP RSC interactive COVID-19 observatory. Users can select the cumulative or week-by-week view of the data, and visualize data by age-band, region, risk group, and COVID-19 status (definite, probable, possible, and excluded) [37]. RCGP RSC: Oxford Royal College of General Practitioners Research and Surveillance Centre.
View this figure

The biggest area of challenge was attribution of codes to certainty of diagnosis. We have had to evolve this with coding system changes (see Multimedia Appendix 1 for our final SNOMED CT [SNOMED Clinical Terms] concept list).

Virology Domain: Weekly Virologic Surveillance Reports

Similarly, our ontology drove the consistent extension of our virology reporting. Sound data structures have also been important because the number of participating virology sampling practices trebled from 100 to 300 to provide more data. The weekly virology report provides a visualization of the absolute number and rate per 10,000 by week of the swabs taken, combined with the matched week from the previous year’s figures for background context (Figure 3). There is a similar observatory for serology (included in Multimedia Appendix 1).

Figure 3. Oxford RCGP RSC interactive virology swabbing report. Users can look at the cumulative or weekly report or compare with the previous year, and look by infected organism or region. “Unknown” is used where no testing is done; currently, samples are only tested for COVID-19. RCGP RSC: Oxford Royal College of General Practitioners Research and Surveillance Centre; ISO: International Organization for Standardization.
View this figure
Clinical Research Domain: Participation in Observational and Interventional Studies

The COVID-19 surveillance application ontology supported consistent reporting of findings in observational and interventional clinical research. We have a series of ongoing observational studies, the first of which has reported results [38]. The network is also supporting the PRINCIPLE (Platform Randomised Trial of Interventions Against COVID-19 in Older People) trial, a UK platform randomized controlled trial of interventions for COVID-19 in primary care. The study is assessing the effectiveness of trial treatments in reducing the need for hospital admission and death in patients with suspected COVID-19 infection aged ≥50 years with serious comorbidity, and aged ≥65 years with or without comorbidity [20]. To date, 830 practices have signed up, with 415 patients randomized; 468 (56.4%) of these are RCGP RSC practices, and they have recruited 342 (82.4%) of the included patients so far.

Clinical Informatics Domain: Creating the COVID-19 Ontology

The annotated application ontology was published on the BioPortal Ontology Repository [39] and will continue to be developed as our understanding of COVID-19 advances and new interventions (eg, vaccination) are introduced. The detail of the ontological development is set out in stage 2 of our 3-step process.

Stage 2: Developing the COVID-19 Surveillance Ontology

Step 1: Ontological Layer

We reviewed emerging case definitions of COVID-19 to identify key concepts used for case ascertainment and their relationships. Concepts included in the ontology were consistent with the WHO data dictionary for COVID-19 case–based reporting [40].

We have limited our presentation of results to the case definition of COVID-19. This has involved grouping concepts into: (1) definite, which include definitive codes for a laboratory-confirmed case of COVID-19; (2) probable, which included a clinical diagnosis of COVID-19 and use of out-of-date codes created during the previous SARS (severe acute respiratory syndrome) outbreak; (3) possible, which contains a range of coding alternatives related to suspected COVID-19 investigation but no result and exposure codes; and (4) excluded, where a test requested is reported as negative (this is demonstrated in Figure 4). At the individual level, the tests work hierarchically, with the most specific one driving the categorization.

Figure 4. Foundational ontological concepts used for COVID-19 surveillance.
View this figure
Step 2: Coding Layer

We completed a dynamic process of mapping clinical terminology codes to concepts that emerged from our ontological layer (Table 3).

The National Health Service (NHS) uses the UK SNOMED CT system of coding, which is normally only updated twice yearly. In early February 2020, there were no clinical codes specific to COVID-19. Initially, CMR suppliers created 5 new system-wide local codes to support essential COVID-19–related recording within a week of being requested [11,19]. Subsequently, 2 emergency releases of novel COVID-19–related UK SNOMED CT codes were developed through a rapid consultation process conducted by the NHS Digital Information Representation Service [41], as greater clinical insight into COVID-19 and stability around nomenclature emerged. These UK SNOMED CT concepts were developed independently of international SNOMED CT terminology development; however, this open-source ontology can be mapped to international terms with ease. We iteratively annotated the ontological concepts with these stepwise-released COVID-19 SNOMED CT clinical concepts.

Table 3. Migration across SNOMED CT (SNOMED Clinical Terms) concepts released from February to May 2020.
Clinical concepts that should be coded in CMRa,bTemporary codescFinal SNOMED CT description
COVID-19 definite
  • Confirmed 2019 nCoV (Wuhan) infection OR
  • Confirmed 2019 nCoV (novel coronavirus) infection
  • COVID-19 confirmed by laboratory test
  • SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) detected
COVID-19 probable
  • No specific codes
  • COVID-19
  • COVID-19 confirmed by clinical diagnostic criteria

COVID-19 possible

Exposure to infectious agent
  • Exposure to 2019 nCoV (Wuhan) infection OR
  • Exposure to 2019 nCoV (novel coronavirus) infection
  • Exposure to SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) infection

Suspected infection
  • Suspected 2019 nCoV (Wuhan) infection OR
  • Suspected 2019 nCoV (novel coronavirus) infection
  • Suspected COVID-19

Test for infectious agent offered or taken
  • No specific codes
  • Swab for SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) taken by health care professional
  • Self-taken swab for SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) offered
  • Self-taken swab for SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) completed

  • Tested for 2019 nCoV (Wuhan) infection OR
  • Tested for 2019 nCoV (novel coronavirus) infection
COVID-19 excluded
  • Excluded 2019 nCoV (Wuhan) infection OR
  • Excluded 2019 nCoV (novel coronavirus) infection
  • COVID-19 excluded
  • COVID-19 excluded by laboratory test
  • SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) not detected

aCMR: computerized medical record.

bFrom ontological layer.

cUsed until replacement with SARS-CoV-2/COVID-19 concepts.

dNot applicable.

Step 3: Logical Data Extract Layer

We incorporated the annotated ontology into the routine surveillance platform of the RCGP RSC data. The ontology identified various states of COVID-19 diagnosis in the incoming data feeds used for surveillance. We conducted a week-by-week analysis of incoming data modifying our outputs to take account of supplier-specific changes in reporting. We are planning for cloud-based extracts and customized extracts from individual CMR vendors; to do this we are creating an Oxford RCGP Clinical Informatics Digital Hub (ORCHID) (Figure 5) [42].

Figure 5. Use of the COVID-19 surveillance ontology across the RCGP RSC processes to achieve semantic consistency in data extraction, visualizations, and surveillance reports. EMR: electronic medical record; GP: general practitioner; ORCHID: Oxford RCGP Clinical Informatics Digital Hub; RCGP RSC: Oxford Royal College of General Practitioners Research and Surveillance Centre; SQL: Structured Query Language.
View this figure
External Evaluation of the Ontology

While we obtained a good consensus in our Delphi exercise, there was important learning and priorities flagged for development. Consensus was obtained for 7 out of 8 (87.5%) of the statements related to coverage of concepts under the upper level headings of the COVID-19 ontology. All panel members except one agreed with statements relating to the applicability of the ontology for case finding activities in their local primary care setting (Table 4). Input from panel members guided expansion of the concepts related to statements not reaching consensus, and this was reviewed by panel members in round 3 of the Delphi exercise.

Table 4. Number of responses and % agreement (strongly agree/agree) to statements relating to the applicability of the ontology for case finding activities in panel members’ local primary care setting.
StatementStrongly disagree, nDisagree, n Neither agree or disagree, n Agree, n Strongly agree, n % Agreement
Please indicate your level of agreement with the coverage of concepts given under each upper level heading of the COVID-19 surveillance ontology.

Symptoms and signs0103588.9

Past medical history/at-risk conditions0032466.7



COVID-19 case status00018100


Process of care0011788.9

The COVID-19 ontology in its current format is suitable for COVID-19 case ascertainment in my local primary care setting.0015388.9

Softer important discussion points emerged; for example, our symptom collection is relatively poorly developed and that there remains uncertainty about risk and protective factors. There was strong feeling among one expert that vaccination and exposures should be part of the ontology; these were subsequently added.

Principal Findings

We rapidly developed an application ontology in-pandemic to support extended surveillance and research activities across the 5 clinical and informatics domains described in our use case. This application ontology has provided a framework, which we have used to help ensure the reliability and consistency of our outputs at a time of change. This iterative ontological approach is flexible and robust enough to match the pace and direction of the evolving clinical landscape of COVID-19.

The focus of our work has been on case identification and associated test results, as these are the foundations on which epidemiological and interventional studies are based. We felt it appropriate to flag the certainty with which a diagnosis is made. We have already used this ontology in observational and interventional studies [20,38].

The separation of the coding layer from the ontological (conceptual) layer allows surveillance to be resilient while new case definitions and clinical codes are added to general practice CMR systems. This approach ensures transparency in case definitions used for reporting and facilitates clear communication by allowing clinicians, database developers (involved in extracting data from practices’ data sources), and practice liaison officers (who advise practices about data recording best practices) to maintain consistency within an organization.

This application ontology could easily and rapidly be adapted for COVID-19 surveillance and clinical research in various other countries and health care networks. As the COVID-19 pandemic continues, there is enormous global pressure on health care systems to understand trends in incidence rates and conduct high-quality research; this ontology is open-source and can be mapped onto local clinical coding systems to permit consistency in analyses.

Comparison With Previous Literature

To our knowledge, this is the first time that a systematic ontological approach has been developed in-pandemic for extended disease surveillance, using structured routine clinical data. This application ontology aligns with previous clinical informatics literature on application ontology engineering and validation through the testable use-case approach [43,44].

There are other pandemic surveillance systems that look at open-source, unstructured data, such as media reports and clusters of symptom-related internet searches, extracting information of epidemiological relevance [45]. Examples of such systems include BioCaster [46], the Global Public Health Intelligence Network [47], ProMed [48], and HealthMap [49]. The latter three systems are working under the WHO collaborative, the Epidemic Intelligence from Open Sources initiative, which played a role in the identification of the COVID-19 outbreak from early media reports from China in December 2019 [50]. Some of the event-based pandemic surveillance systems have published ontological foundations in the public health and surveillance domains [46,51,52]. While useful for providing supplementary information to epidemiologists on the emergence of an outbreak in real time, these knowledge representations do not specifically address the types of information described in clinical data, such as presenting complaint, comorbidities, virology, or health outcomes.

There are very limited studies of data platforms’ performance within integrated clinical surveillance systems [45]. The lack of accurate and available data to underpin epidemic forecasting in emerging outbreaks has been highlighted [53].

We found no literature using an ontological approach for COVID-19 surveillance. There are domain ontologies related to the coronavirus published on BioPortal. The first focuses on the wider Coronaviridae family and their biochemical and microbiological properties [54], while the second was developed to provide semantic assistance for clinical research form completion [55]. None were designed to integrate the various clinical data streams necessary to carry out COVID-19 surveillance.

Strengths and Limitations

The 3-step iterative ontological process that we have implemented has proven to be suitably flexible to cope with the changes in COVID-19 terminology and CMR system codes. A further strength was the implementation and deployment of this ontology, considering the FAIR guiding principles [29]. The ontology is discoverable and accessible on the BioPortal ontology repository. This application ontology, built using best practices around defining and testing a use case, is inherently interoperable and reusable [29]. In the absence of a gold-standard infectious disease surveillance ontology, we believe our attempts at achieving a degree of consensus and external validity from a range of international experts in the field of clinical informatics and primary care as a major strength of the current study. While the Delphi panel size was relatively small and a limitation, we purposefully selected panel members from a range of countries with varied clinical coding systems.

We focused on case finding and results; we now need to turn our attention to presenting symptoms, particularly looking to focus on those that may be of prognostic value and emerging treatments including vaccination. Our ontology as currently run will classify false positive lab results incorrectly, and we recognize this is a limitation that should be noted by users. Additional limitations were its development in a single sentinel system and that it was not developed ready to integrate into a common data model [56].


We have created a COVID-19 application ontology, with strengths that include its speed of development, being openly shared via BioPortal, and its adaptability. The limitations are its development in a single sentinel network and its current limited focus. The ontology should make conclusions based on primary care sentinel data more transparent and facilitate pooled analyses in COVID-19 surveillance and research. We welcome any requests for information on applying our COVID-19 surveillance application ontology to other health care settings, both domestically and internationally.


We thank the practices and patients of RCGP RSC, who allowed their pseudonymized clinical medical records to be used for this work. We would also like to acknowledge the members of the Primary Health Care Informatics Working Group of the International Medical Informatics Association for their contribution in validating the ontology. Funding for this research was provided by Public Health England / Wellcome.

Authors' Contributions

SdL conceived the need for this ontology with important input from HL, JW, DE, and DM. DM and HL wrote an initial draft of the manuscript, and SdL produced the first complete manuscript. All authors contributed to the scope of the ontology, contributed comments, and read and approved the final version.

Conflicts of Interest

None declared.

Multimedia Appendix 1

Supplementary tables and figures.

DOCX File , 473 KB

  1. Lipsitz LA. Understanding health care as a complex system: the foundation for unintended consequences. JAMA 2012 Jul 18;308(3):243-244 [FREE Full text] [CrossRef] [Medline]
  2. Sturmberg J, Lanham HJ. Understanding health care delivery as a complex system: achieving best possible health outcomes for individuals and communities by focusing on interdependencies. J Eval Clin Pract 2014 Dec 05;20(6):1005-1009. [CrossRef] [Medline]
  3. Wang C, Horby PW, Hayden FG, Gao GF. A novel coronavirus outbreak of global health concern. The Lancet 2020 Feb 15;395(10223):470-473 [FREE Full text] [CrossRef] [Medline]
  4. Novel Coronavirus(2019-nCoV) Situation Report - 10. World Health Organization. 2020 Jan 30.   URL: https:/​/www.​​docs/​default-source/​coronaviruse/​situation-reports/​20200130-sitrep-10-ncov.​pdf?sfvrsn=d0b2e480_2 [accessed 2020-05-11]
  5. Gorbalenya A. Severe acute respiratory syndrome-related coronavirus: The species and its viruses – a statement of the Coronavirus Study Group. bioRxiv Preprint posted online February 11, 2020. [CrossRef]
  6. Novel Coronavirus (2019-nCoV) Situation Report - 22. World Health Organization. 2020 Feb 11.   URL: [accessed 2020-05-11]
  7. Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet 2020 Feb 15;395(10223):497-506 [FREE Full text] [CrossRef] [Medline]
  8. Giacomelli A, Pezzati L, Conti F, Bernacchia D, Siano M, Oreni L, et al. Self-reported Olfactory and Taste Disorders in Patients With Severe Acute Respiratory Coronavirus 2 Infection: A Cross-sectional Study. Clin Infect Dis 2020 Jul 28;71(15):889-890 [FREE Full text] [CrossRef] [Medline]
  9. Baud D, Qi X, Nielsen-Saines K, Musso D, Pomar L, Favre G. Real estimates of mortality following COVID-19 infection. Lancet Infect Dis 2020 Jul;20(7):773 [FREE Full text] [CrossRef] [Medline]
  10. Verity R, Okell LC, Dorigatti I, Winskill P, Whittaker C, Imai N, et al. Estimates of the severity of coronavirus disease 2019: a model-based analysis. The Lancet Infectious Diseases 2020 Jun;20(6):669-677. [CrossRef]
  11. de Lusignan S, Williams J. To monitor the COVID-19 pandemic we need better quality primary care data. BJGP Open 2020;4(2) [FREE Full text] [CrossRef] [Medline]
  12. de Lusignan S. In this issue: Ontologies a key concept in informatics and key for open definitions of cases, exposures, and outcome measures. J Innov Health Inform 2015 Jul 10;22(2):170 [FREE Full text] [CrossRef] [Medline]
  13. Musen MA. Domain Ontologies in Software Engineering: Use of Protégé with the EON Architecture. Methods Inf Med 2018 Feb 15;37(04/05):540-550. [CrossRef]
  14. Malone J, Parkinson H. Reference and Application Ontologies. Ontogenesis. 2010.   URL: [accessed 2020-05-11]
  15. Kumarapeli P, De Lusignan S, Ellis T, Jones B. Using Unified Modelling Language (UML) as a process-modelling technique for clinical-research process improvement. Med Inform Internet Med 2007 Mar 12;32(1):51-64. [CrossRef] [Medline]
  16. Correa A, Hinton W, McGovern A, van VJ, Yonova I, Jones S, et al. Royal College of General Practitioners Research and Surveillance Centre (RCGP RSC) sentinel network: a cohort profile. BMJ Open 2016 Apr 20;6(4):e011092 [FREE Full text] [CrossRef] [Medline]
  17. de Lusignan S, Correa A, Smith GE, Yonova I, Pebody R, Ferreira F, et al. RCGP Research and Surveillance Centre: 50 years’ surveillance of influenza, infections, and respiratory conditions. Br J Gen Pract 2017 Sep 29;67(663):440-441. [CrossRef]
  18. Pebody R, Warburton F, Ellis J, Andrews N, Potts A, Cottrell S, et al. Effectiveness of seasonal influenza vaccine for adults and children in preventing laboratory-confirmed influenza in primary care in the United Kingdom: 2015/16 end-of-season results. Euro Surveill 2016 Sep 22;21(38) [FREE Full text] [CrossRef] [Medline]
  19. de Lusignan S, Lopez Bernal J, Zambon M, Akinyemi O, Amirthalingam G, Andrews N, et al. Emergence of a Novel Coronavirus (COVID-19): Protocol for Extending Surveillance Used by the Royal College of General Practitioners Research and Surveillance Centre and Public Health England. JMIR Public Health Surveill 2020 Apr 02;6(2):e18606 [FREE Full text] [CrossRef] [Medline]
  20. Butler C. PRINCIPLE: A trial evaluating treatments for suspected COVID-19 in people aged 50 years and above with pre-existing conditions and those aged 65 years and above. ISRCTN Registry 2020. [CrossRef]
  21. Cole NI, Liyanage H, Suckling RJ, Swift PA, Gallagher H, Byford R, et al. An ontological approach to identifying cases of chronic kidney disease from routine primary care data: a cross-sectional study. BMC Nephrol 2018 Apr 10;19(1):85 [FREE Full text] [CrossRef] [Medline]
  22. Liyanage H, Williams J, Byford R, de Lusignan S. Ontology to identify pregnant women in electronic health records: primary care sentinel network database study. BMJ Health Care Inform 2019 Jul;26(1) [FREE Full text] [CrossRef] [Medline]
  23. Smith S, Morbey R, de Lusignan S, Pebody R, Smith G, Elliot A. Investigating regional variation of respiratory infections in a general practice syndromic surveillance system. J Public Health (Oxf) 2020 Feb 02. [CrossRef] [Medline]
  24. Smith N, Livina V, Byford R, Ferreira F, Yonova I, De LS. Automated differentiation of incident and prevalent cases in primary care computerised medical records (CMR). In: Studies in Health Technology and Informatics. Amsterdam: IOS Press; 2018:151-155.
  25. Cockburn A. Writing effective use cases. Boston: Addison-Wesley Professional; 2000.
  26. Liyanage H, de Lusignan S, Liaw S, Kuziemsky CE, Mold F, Krause P, et al. Big Data Usage Patterns in the Health Care Domain: A Use Case Driven Approach Applied to the Assessment of Vaccination Benefits and Risks. Contribution of the IMIA Primary Healthcare Working Group. Yearb Med Inform 2014 Aug 15;9:27-35 [FREE Full text] [CrossRef] [Medline]
  27. de Lusignan S, Khunti K, Belsey J, Hattersley A, van Vlymen J, Gallagher H, et al. A method of identifying and correcting miscoding, misclassification and misdiagnosis in diabetes: a pilot and validation study of routinely collected data. Diabet Med 2010 Feb;27(2):203-209. [CrossRef] [Medline]
  28. Rollason W, Khunti K, de Lusignan S. Variation in the recording of diabetes diagnostic data in primary care computer systems: implications for the quality of care. Inform Prim Care 2009 Jun 01;17(2):113-119 [FREE Full text] [CrossRef] [Medline]
  29. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016 Mar 15;3(1):160018 [FREE Full text] [CrossRef] [Medline]
  30. Musen MA, The Protégé Project. The Protégé Project: A Look Back and a Look Forward. AI Matters 2015 Jun;1(4):4-12 [FREE Full text] [CrossRef] [Medline]
  31. McGuinness D, van Harmelen F. OWL Web Ontology Language Overview: W3C Recommendation. W3C.   URL: [accessed 2020-11-09]
  32. Liaw S, Liyanage H, Kuziemsky C, Terry AL, Schreiber R, Jonnagaddala J, et al. Ethical Use of Electronic Health Record Data and Artificial Intelligence: Recommendations of the Primary Care Informatics Working Group of the International Medical Informatics Association. Yearb Med Inform 2020 Aug 17;29(1):51-57 [FREE Full text] [CrossRef] [Medline]
  33. Liyanage H, Liaw S, Jonnagaddala J, Schreiber R, Kuziemsky C, Terry AL, et al. Artificial Intelligence in Primary Health Care: Perceptions, Issues, and Challenges. Yearb Med Inform 2019 Aug 25;28(1):41-46. [CrossRef] [Medline]
  34. Taylor MJ. Legal bases for disclosing confidential patient information for public health: Distinguishing between health protection and health improvement. Med Law Rev 2015 May 20;23(3):348-374 [FREE Full text] [CrossRef] [Medline]
  35. RCGP RSC COVID-19 Surveillance. RCGP RSC. 2020.   URL: [accessed 2020-06-13]
  36. RCGP RSC COVID-19 Practice-Level Dashboard. RCGP RSC. 2020.   URL: [accessed 2020-06-13]
  37. COVID-19 Observatory. RCGP RSC. 2020.   URL: [accessed 2020-03-31]
  38. de Lusignan S, Dorward J, Correa A, Jones N, Akinyemi O, Amirthalingam G, et al. Risk factors for SARS-CoV-2 among patients in the Oxford Royal College of General Practitioners Research and Surveillance Centre primary care network: a cross-sectional study. The Lancet Infectious Diseases 2020 Sep;20(9):1034-1042. [CrossRef]
  39. Liyanage H, de LS, Williams J. COVID-19 Surveillance Ontology. BioPortal. 2020.   URL: [accessed 2020-05-04]
  40. Global surveillance for COVID-19 caused by human infection with COVID-19 virus: Interim guidance. World Health Organization. 2020 Mar 20.   URL: https:/​/apps.​​iris/​bitstream/​handle/​10665/​331506/​WHO-2019-nCoV-SurveillanceGuidance-2020.​6-eng.​pdf [accessed 2020-03-30]
  41. Clinical guidance on COVID-19 SNOMED codes. NHS Digital. 2020.   URL: [accessed 2020-04-10]
  42. de Lusignan S, Jones N, Dorward J, Byford R, Liyanage H, Briggs J, et al. The Oxford Royal College of General Practitioners Clinical Informatics Digital Hub: Protocol to Develop Extended COVID-19 Surveillance and Trial Platforms. JMIR Public Health Surveill 2020 Jul 02;6(3):e19773 [FREE Full text] [CrossRef] [Medline]
  43. Malone J, Holloway E, Adamusiak T, Kapushesky M, Zheng J, Kolesnikov N, et al. Modeling sample variables with an Experimental Factor Ontology. Bioinformatics 2010 Apr 15;26(8):1112-1118 [FREE Full text] [CrossRef] [Medline]
  44. Doing-Harris KM, Zeng-Treitler Q. Computer-assisted update of a consumer health vocabulary through mining of social network data. J Med Internet Res 2011 May 17;13(2):e37 [FREE Full text] [CrossRef] [Medline]
  45. Velasco E, Agheneza T, Denecke K, Kirchner G, Eckmanns T. Social media and internet-based data in global systems for public health surveillance: a systematic review. Milbank Q 2014 Mar 06;92(1):7-33. [CrossRef] [Medline]
  46. Collier N, Doan S, Kawazoe A, Goodwin RM, Conway M, Tateno Y, et al. BioCaster: detecting public health rumors with a Web-based text mining system. Bioinformatics 2008 Dec 15;24(24):2940-2941 [FREE Full text] [CrossRef] [Medline]
  47. Dion M, AbdelMalik P, Mawudeku A. Big Data and the Global Public Health Intelligence Network (GPHIN). Can Commun Dis Rep 2015 Sep 03;41(9):209-214 [FREE Full text] [CrossRef] [Medline]
  48. Yu VL, Madoff LC. ProMED-mail: an early warning system for emerging diseases. Clin Infect Dis 2004 Jul 15;39(2):227-232 [FREE Full text] [CrossRef] [Medline]
  49. Freifeld CC, Mandl KD, Reis BY, Brownstein JS. HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports. Journal of the American Medical Informatics Association 2008 Mar 01;15(2):150-157. [CrossRef]
  50. Epidemic Intelligence from Open Sources (EIOS). World Health Organization. 2020.   URL: [accessed 2020-04-03]
  51. Okhmatovskaia A, Chapman WW, Collier N, Espino J, Buckeridge DL. SSO: The Syndromic Surveillance Ontology. The National Center for Biomedical Ontology. 2009.   URL: [accessed 2020-04-03]
  52. Crubézy M, O'Connor M, Buckeridge D, Pincus Z, Musen M. Ontology-Centered Syndromic Surveillance for Bioterrorism. IEEE Intell. Syst 2005 Sep;20(5):26-35. [CrossRef]
  53. Buckee C. Improving epidemic surveillance and response: big data is dead, long live big data. The Lancet Digital Health 2020 May;2(5):e218-e220. [CrossRef]
  54. He Y. Coronavirus Infectious Disease Ontology. BioPortal. 2020.   URL: [accessed 2020-04-14]
  55. Bonino L. WHO COVID-19 Rapid Version CRF semantic data model. BioPortal. 2020.   URL: [accessed 2020-04-23]
  56. Liyanage H, Liaw S, Jonnagaddala J, Hinton W, de Lusignan S. Common Data Models (CDMs) to Enhance International Big Data Analytics: A Diabetes Use Case to Compare Three CDMs. Stud Health Technol Inform 2018;255:60-64. [Medline]

CMR: computerized medical record
FAIR: findable, accessible, interoperable, and reusable
NHS: National Health Service
OWL: Web Ontology Language
PHE: Public Health England
PRINCIPLE: Platform Randomised Trial of Interventions Against COVID-19 in Older People
RCGP: Oxford Royal College of General Practitioners
RSC: Research and Surveillance Centre
SARS: severe acute respiratory syndrome
SNOMED CT: SNOMED Clinical Terms
ORCHID: Oxford RCGP Clinical Informatics Digital Hub

Edited by T Sanchez; submitted 01.07.20; peer-reviewed by PSS Lee, J Aarts; comments to author 22.09.20; revised version received 02.10.20; accepted 02.10.20; published 17.11.20


©Simon de Lusignan, Harshana Liyanage, Dylan McGagh, Bhautesh Dinesh Jani, Jorgen Bauwens, Rachel Byford, Dai Evans, Tom Fahey, Trisha Greenhalgh, Nicholas Jones, Frances S Mair, Cecilia Okusi, Vaishnavi Parimalanathan, Jill P Pell, Julian Sherlock, Oscar Tamburis, Manasa Tripathy, Filipa Ferreira, John Williams, F D Richard Hobbs. Originally published in JMIR Public Health and Surveillance (, 17.11.2020.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.