This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.
According to the World Health Organization, malaria surveillance is weakest in countries and regions with the highest malaria burden. A core obstacle is that the data required to perform malaria surveillance are fragmented in multiple data silos distributed across geographic regions. Furthermore, consistent integrated malaria data sources are few, and a low degree of interoperability exists between them. As a result, it is difficult to identify disease trends and to plan for effective interventions.
We propose the Semantics, Interoperability, and Evolution for Malaria Analytics (SIEMA) platform for use in malaria surveillance based on semantic data federation. Using this approach, it is possible to access distributed data, extend and preserve interoperability between multiple dynamic distributed malaria sources, and facilitate detection of system changes that can interrupt mission-critical global surveillance activities.
We used Semantic Automated Discovery and Integration (SADI) Semantic Web Services to enable data access and improve interoperability, and the graphical user interface-enabled semantic query engine HYDRA to implement the target queries typical of malaria programs. We implemented a custom algorithm to detect changes to community-developed terminologies, data sources, and services that are core to SIEMA. This algorithm reports to a dashboard. Valet SADI is used to mitigate the impact of changes by rebuilding affected services.
We developed a prototype surveillance and change management platform from a combination of third-party tools, community-developed terminologies, and custom algorithms. We illustrated a methodology and core infrastructure to facilitate interoperable access to distributed data sources using SADI Semantic Web services. This degree of access makes it possible to implement complex queries needed by our user community with minimal technical skill. We implemented a dashboard that reports on terminology changes that can render the services inactive, jeopardizing system interoperability. Using this information, end users can control and reactively rebuild services to preserve interoperability and minimize service downtime.
We introduce a framework suitable for use in malaria surveillance that supports the creation of flexible surveillance queries across distributed data resources. The platform provides interoperable access to target data sources, is domain agnostic, and with updates to core terminological resources is readily transferable to other surveillance activities. A dashboard enables users to review changes to the infrastructure and invoke system updates. The platform significantly extends the range of functionalities offered by malaria information systems, beyond the state-of-the-art.
Malaria is an infectious disease with significant impact on developing countries. In 2016 alone, it caused 445,000 deaths worldwide, and globally around 216 million cases of malaria have been reported in 91 countries [
A comprehensive study [
Furthermore, 11 widely used Web platforms were studied to assess how internet and Web technologies are used in the fight against malaria [
Indeed, the malaria surveillance community is not alone in facing these challenges, and many other communities are investigating how to bring distributed datasets together in real time to support decision making. Researchers in other domains have sought to introduce guidelines for ensuring that source data are published in ways that ensure they are findable, accessible, interoperable, and reusable [
The specific challenge of interoperability has two dimensions, namely structural and semantic interoperability. Structural or syntactic interoperability can be achieved by defining common syntax and formats for data exchange. For example, if two systems such as the Malaria Atlas Project [
Overcoming the challenge of distributed data access has relied on established technologies such as Web services, but interoperability is still lacking in many implementations. In recent work, Web service-based data access and interoperability challenges have been tackled together using Semantic Web service infrastructures [
We introduce a prototype surveillance and change management platform, known as SIEMA, built from a combination of third-party tools, community-developed terminologies, and custom algorithms. We illustrate the methodology and core infrastructure used to facilitate interoperable access to distributed data sources using Semantic Automated Discovery and Integration (SADI) [
The SIEMA surveillance platform relies on the coordination and customization of a number of existing frameworks, and software and custom-developed algorithms. The architecture diagram in
SADI is a representational state transfer (RESTful) Web service framework that provides a set of conventions for creating Semantic Web services. The framework uses resource description framework schema (RDF[S]) [
Our research focuses on middleware for enabling discovery of datasets and tools for agile query composition, but deployments of these methodologies beyond prototypes will require full access to malaria data. Existing data repositories such as the Scalable Data Integration for Disease Surveillance [
SHARE is a specialized open source query client that enables end users to discover, plan, and orchestrate SADI services in a registry and invoke them automatically from SPARQL [
Detecting changes in the source data schema, as well as the domain and service ontologies, is a prerequisite step for change management. Studies on the evolution of large domain ontologies [
Architecture of the Semantics, Interoperability, and Evolution for Malaria Analytics (SIEMA) surveillance framework. GUI: graphical user interface; I/O: input/output; RDF: resource description framework; SADI: Semantic Automated Discovery and Integration.
Underpinning the dashboard, software agents enable two key actions: (1) detecting changes and identifying their types, and (2) restoring the modified component to an operational state through repair and rebuilding. The types of changes we accommodate are addition (ie, extension), deletion (ie, obsoleting), and renaming (ie, refining) of components in domain ontologies and service ontologies [
Valet SADI [
The implementation presented in this paper is inspired by the objectives stated in the National Malaria Control Program by the Ugandan Ministry of Health [
We selected primarily a series of widely used malaria control interventions. The importance of indoor residual spraying has been well established by numerous studies throughout the world, especially in Africa [
To answers the questions, we created and deployed a list of SADI services in a registry. We focus below on how to create those services.
Vocabularies from one or more domain ontologies are used to define the input and the output of a service. The data schema of the source data is also necessary. The vocabularies and the data schema in
The names of SADI services are expressed in two different forms: (1) allX, which retrieves all information regarding X without expecting any input, and (2) getYByZ, which retrieves Y based on the input Z. The input and the output of every service are defined in a service ontology using the terminologies from the domain ontologies. One such service is getInsecticideIdByIndoorResidualSprayingId, which takes an instance of spraying as input, which is any element whose
Another service is getNameByInsecticideId, which takes an instance of an insecticide as input. The service returns a string as output, representing the name of the insecticide in the data, decorating the input by the relation
Q1. Which indoor residual sprayings used permethrin as an insecticide?
Q2. Which districts of Uganda that used permethrin-based long-lasting insecticide-treated nets in 2015 saw a decrease in
Q3. What are the future high-risk areas and at-risk time periods in Uganda?
Snapshot of source data schema and domain ontologies.
Input (left) and output (right) descriptions of the services getInsecticideIdByIndoorResidualSprayingId (top) and getNameByInsecticideId (bottom).
A fragment of the registry of Semantic Automated Discovery and Integration (SADI) services.
1 Forall ?insecticideID (identityForInsecticideToinsecticideID(
2 identityForInsecticide(?insecticideID)) = ?insecticideID)
3 Forall ?P (identityForInsecticide(identityForInsecticideToinsecticideID(?P))= ?P)
4 Forall ?id ?name ?mode.of.action (
5 Insecticide(identityForInsecticide(?id)) :-
6 db_insecticide(?id ?name))
7 Forall ?id ?name ?mode.of.action (
8 has_name(identityForInsecticide(?id) ?name) :-
9 db_insecticide(?id ?name))
Once the services have been generated, they are stored in a service registry.
Specification and building of SADI services can be cumbersome, error prone, and tedious for nontechnical end users. Full implementation details are outside the scope of this paper; however, we encourage readers to look at the details in Brenas et al [
To illustrate query building, consider Q1 in
Services in (3) and (4) are described above, while the service in (1) retrieves all identifiers of public health activities in Uganda, and the service in (2) retrieves the names of these activities. The branch on the right in
The query in
allPublicHealthActivities,
getNameByPublicHealthActivityId,
getInsecticideIdByIndoorResidualSprayingId, and
getNameByInsecticideId.
Graph representation of a query for the question “Which indoor residual sprayings used permethrin as an insecticide?" prepared on the HYDRA graphical user interface.
Graph representation of a query for the question “Which districts of Uganda that used permethrin-based long-lasting insecticide-treated nets in 2015 saw a decrease in
Graph representation of the query “Which indoor residual spraying used permethrin as an insecticide and which kind of mosquitoes will be affected by it?”.
The previous section outlined how, using Semantic Web services, it is possible to answer complex questions relevant to malaria surveillance. Special attention is needed before considering the introduction of a new methodology in a dynamic context where data and middleware are not static. Several possible changes that could occur in a malaria surveillance framework have been described and classified according to the degree to which they affect data access and their likelihood to affect interoperability of the system [
Whenever a definition of an existing service is modified, the associated service ontology is changed, but the code implementing the service remains unchanged. As a result, when a service is invoked during the execution of a query, it does not return the anticipated output because the terms used in the code are incompatible with the new definition in the service ontology. In the SIEMA framework, the change capture agent implemented within the dashboard detects the changes in the terms used in the service ontology by comparing the modified version of the ontology with the one it was modified from. The role of the change capture agent can be illustrated in the case of a term addition.
To illustrate a scenario involving the addition of new terms, consider the query in
Let us assume that the end user is interested in the query “Which indoor residual spraying used permethrin as an insecticide and which kind of mosquitoes will be affected by it?”
A change capture agent detects the changes of two terms, identifies them as addition, and displays them on the dashboard in a tabular form.
At any time, the status of all deployed services in the registry is displayed on the dashboard. Services can be either active, which can be used in queries, or inactive, which need to be repaired before using them again.
The addition of terms to the definition of an active service renders the service description incompatible with the target functionality and existing service code, and renders associated queries dysfunctional. To resolve the inconsistency, it is necessary to repair and rebuild the services in line with the new requirements. Specifically, the end user now wants to access the data that were not previously available from a service, namely, in this example, the mode of action of an insecticide. It is thus necessary to reimplement the service corresponding to the altered service description ensuring that the domain ontologies, the data schemata, and the PSOA semantic mapping rules underpinning the service are accurate and will support the new target functionality.
Given that a data resource contains the information about the mode of action of insecticides, the key question is whether the semantic mapping rules already map those data to an existing concept or relation of the domain ontologies. If that is the case, then all components required to rebuild the services exist, and it is possible to proceed to the next step in the Valet SADI rebuild. Otherwise, it is necessary to identify missing rules and add them, or extend a local domain ontology with a missing concept or relation that exists in the service ontology. Once this is done, a rule must be created to define a new mapping and to make rebuilding the service possible.
By leveraging Valet SADI’s autogeneration capability, the damaged service can be quickly rebuilt and deployed once changes are detected and identified, and a rebuild is requested. To illustrate this, we refer to the query shown in
In the domain ontology, the data property
The PSOA rules are also modified to populate the newly added data property
Old (left) and new (right) output description of the service getNameByInsecticideId.
Detection and identification of changes in the service ontology.
Timestamp | Description of change | Entity added | Entity deleted | Entity renamed | Affected service | Affected query |
2018-01-21T14:33:08 | An entity is added to the output definition | N/Aa | N/A | getNameByInsecticideId | Which indoor residual sprayings used permethrin as an insecticide? | |
2018-01-21T4:33:08 | An entity is added to the output definition | xsd:string | N/A | N/A | getNameByInsecticideId | Which indoor residual sprayings used permethrin as an insecticide? |
aN/A: not applicable.
Status of Semantic Automated Discovery and Integration (SADI) services. Active services are shown in green and inactive services in red.
12' Group (
13' Forall ?insecticideID (identityForInsecticideToinsecticideID(
14' identityForInsecticide(?insecticideID)) = ?insecticideID)
15' Forall ?P (identityForInsecticide(identityForInsecticideToinsecticideID(?P)) = ?P)
16' Forall ?id ?name ?mode.of.action (
17' Insecticide(identityForInsecticide(?id)) :-
18' db_insecticide(?id ?name ?mode.of.action))
19' Forall ?id ?name ?mode.of.action (
20' has_name(identityForInsecticide(?id) ?name) :-
21' db_insecticide(?id ?name ?mode.of.action))
22' Forall ?id ?name ?mode.of.action (
23' has_mode_of_action(identityForInsecticide(?id) ?mode.of.action) :-
24' db_insecticide(?id ?name ?mode.of.action))
25' )
Status of services after being rebuilt by Valet SADI.
Surveillance remains a challenge for the malaria community, and many factors play a role in limiting access to relevant data resources for analysis and reuse [
The SIEMA framework comprises several technologies and standards and is further customized to address the proposed targeted needs and interests of surveillance practitioners. The contribution takes three main directions. First, using SADI Web services allows for easy access to distributed data. This task is simplified further using Valet SADI, which enables a programmer to create services in an efficient and straightforward way. Second, due to the user interface features of the HYDRA query engine, SIEMA offers end users a more appealing way to build surveillance queries. HYDRA’s ability to discover and call the services that are needed for a query permits the user to simply use the data as an abstract construct without having to look at its actual structure. Third, to make the system more robust and flexible, a dashboard has been introduced. The dashboard informs users when changes have occurred that render the services or queries inactive. This enables users to know which queries may no longer be reliable and to identify which parts of the service infrastructure must be rebuilt to restore it to its fully interoperable state. Deployed together, this combination of technologies offered by the SIEMA framework exhibits key functionalities that are of great value to the community.
Our initial studies in malaria surveillance [
Whereas a systematic evaluation of the many components of this framework, individually and together, is beyond the scope of this initial study, we are aware that other malaria surveillance systems in sub–Saharan African countries have been reported and evaluated in part [
A brief assessment of SIEMA according to attributes recommended by Centers for Disease Control and Prevention [
Overall, we anticipate that the ongoing trials with the SIEMA framework will give the research and development team further insight into real-world requirements for interoperability and change management in malaria surveillance, leading to further improvements in adaptability and performance. Given the critical need for timely integration of distributed data from multiple heterogeneous sources in an efficient way, we hope to build cooperative partnerships between multiple disciplines, organizations, and sectors. In addition, insights gained from this research are likely transferable to a range of global surveillance projects.
We have demonstrated that authentic questions asked in malaria surveillance can be formalized as queries and mapped to a combination of Semantic Web services designed to deliver target data from distributed data sources. We have shown that using SIEMA and leveraging terminologies from community-developed ontologies offer flexibility both for integrating data and for easily composing queries. The developed infrastructure also offers a solution to the problem of change management, an important process for maintaining interoperability and integrity of an integrated surveillance system. Given that changes in the form of addition, renaming, and deletion of terminologies can frequently occur in the face of evolving system requirements, we introduced a change management dashboard. This makes it possible to identify important changes, report on the status of services as a consequence of changes, and offer users the option to rebuild inactive services. The dashboard and service reauthoring routines serve as an important vehicle to maintain system interoperability of mission-critical global surveillance programs. The infrastructure has been implemented and its relevance has been demonstrated with an authentic use case, with the goal of soliciting further requirements from the malaria analytics community. In future work, we will deploy SIEMA on live dynamic data sources.
graphical user interface
Infectious Disease Ontology-Malaria
Mosquito Insecticide Resistance Ontology
Web Ontology Language
positional-slotted object-applicative
resource description framework schema
Semantic Automated Discovery and Integration
Semantics, Interoperability, and Evolution for Malaria Analytics
Structured Query Language
Vector Surveillance and Management Ontology
This work was funded by the Bill and Melinda Gates Foundation (OPP ID # 1162018). A license for the use of HYDRA was provided by IPSNP Computing Inc.