Published on 01.06.16 in Vol 2, No 1 (2016): Jan-Jun
Utilizing Nontraditional Data Sources for Near Real-Time Estimation of Transmission Dynamics During the 2015-2016 Colombian Zika Virus Disease Outbreak
Background: Approximately 40 countries in Central and South America have experienced local vector-born transmission of Zika virus, resulting in nearly 300,000 total reported cases of Zika virus disease to date. Of the cases that have sought care thus far in the region, more than 70,000 have been reported out of Colombia.
Objective: In this paper, we use nontraditional digital disease surveillance data via HealthMap and Google Trends to develop near real-time estimates for the basic (R0) and observed (Robs) reproductive numbers associated with Zika virus disease in Colombia. We then validate our results against traditional health care-based disease surveillance data.
Methods: Cumulative reported case counts of Zika virus disease in Colombia were acquired via the HealthMap digital disease surveillance system. Linear smoothing was conducted to adjust the shape of the HealthMap cumulative case curve using Google search data. Traditional surveillance data on Zika virus disease were obtained from weekly Instituto Nacional de Salud (INS) epidemiological bulletin publications. The Incidence Decay and Exponential Adjustment (IDEA) model was used to estimate R0 and Robs for both data sources.
Results: Using the digital (smoothed HealthMap) data, we estimated a mean R0 of 2.56 (range 1.42-3.83) and a mean Robs of 1.80 (range 1.42-2.30). The traditional (INS) data yielded a mean R0 of 4.82 (range 2.34-8.32) and a mean Robs of 2.34 (range 1.60-3.31).
Conclusions: Although modeling using the traditional (INS) data yielded higher R0 estimates than the digital (smoothed HealthMap) data, modeled ranges for Robs were comparable across both data sources. As a result, the narrow range of possible case projections generated by the traditional (INS) data was largely encompassed by the wider range produced by the digital (smoothed HealthMap) data. Thus, in the absence of traditional surveillance data, digital surveillance data can yield similar estimates for key transmission parameters and should be utilized in other Zika virus-affected countries to assess outbreak dynamics in near real time.
JMIR Public Health Surveill 2016;2(1):e30
- Zika virus disease;
- digital disease surveillance;
- mathematical modeling;
- reproductive number;
- transmission dynamics
Recent infectious disease outbreaks—including severe acute respiratory syndrome (SARS), Middle East respiratory syndrome (MERS), Ebola, and influenza A (H1N1)—have presented great challenges to the global public health community, including lack of basic epidemiologic knowledge to support important preparedness and control decisions. To address this gap, innovative surveillance methods have been developed over the last several years to leverage the increasing availability of digital data related to outbreaks. To date, many studies have retrospectively examined nontraditional digital data sources and have demonstrated their utility in estimating epidemic curves or changes in important epidemiologic parameters over time [- ]. Such studies have provided a foundation for building near real-time prospective analytic techniques that can assess transmission dynamics in the absence of traditional data. These methodological developments fill a knowledge vacuum that may prove useful for public health decision making in the early stages of an outbreak.
The ongoing outbreak of Zika virus disease in Central and South America has attracted global attention due to its rapid geospatial growth as well as concerns over associated central nervous system complications [, ]. Although Zika virus is primarily transmitted via Aedes mosquitoes, evidence of vertical and sexual transmission exists [ - ]. Likely introduced to the Americas in mid- to late 2013, the virus has since been propagated by the density of competent vectors throughout the region [ ]. At present, approximately 40 countries in Central and South America have experienced local vector-borne transmission, resulting in nearly 300,000 total reported cases to date [ ]. Of the cases that have sought care thus far in the region, about 70,000 have been reported out of Colombia, of which 17% were pregnant at time of clinical or laboratory diagnosis [ , ]. However, given the generally mild nature of Zika virus disease and subsequent lack of care seeking, reported cases undoubtedly comprise a small fraction of total cases [ , ].
Current prevention efforts focus on vector suppression , while interest in and efforts toward vaccine development are mounting rapidly due to increasing rates of Guillain-Barré syndrome following Zika virus infection and microcephaly in newborn babies born to women infected with Zika virus while pregnant [ , ]. Quantitative analyses designed to inform vaccine policies—in addition to other preparedness and control activities—are dependent on the transmission dynamics associated with the disease and, therefore, estimates for critical epidemiologic parameters are urgently needed for such decision making within the context of Zika virus disease.
In this paper, we use nontraditional digital disease surveillance data via HealthMap and Google Trends to develop near real-time estimates for the basic and observed reproductive numbers associated with Zika virus disease in Colombia as well as expected final outbreak size and duration. We then validate our results against traditional health care-based disease surveillance data and discuss the implications of our work on outbreak mitigation strategies in Colombia and assessment of transmission dynamics elsewhere in the region.
Cumulative reported case counts of Zika virus disease in Colombia were acquired via the HealthMap digital disease surveillance system, consisting of 28 unique nongovernmental media alerts between October 16, 2015 and April 16, 2016 . The cumulative reported case curve obtained from these reports shows an unrealistic L-shape, presumably due to increased interest in reporting during recent weeks and lack of awareness during early weeks ( ). By assuming that the total number of cases obtained from HealthMap was a reasonable approximation of reality for the given time period, we used Google search data to distribute cumulative reported case counts more realistically over time.
Although many cases of dengue and influenza go undetected, previous studies have shown that relevant Google search trends demonstrate high linear correlation with reported disease incidence over time [, ]. Thus, we obtained weekly Google search fractions of the term “Zika” from Colombia via the Google Trends website (accessed on April 29, 2016). These search fractions are displayed weekly as normalized values that range from zero to 100, which reflect the level of nationwide search interest in the word “Zika” from January 4, 2004 (first available datum) to April 16, 2016.
We created a smoothed cumulative incidence curve (referred to as “smoothed HealthMap”) by scaling the Google search curve against the HealthMap-reported Zika cases . The scaling constant was obtained by dividing the most recent total number of HealthMap-reported Zika cases by the total number of Google search fractions from May 31, 2015 to April 16, 2016. Perhaps due to initial delays in reporting, the first relevant uptick of the term “Zika” in the Google Trends data occurred during the week of May 31, 2015, approximately 20 weeks before the first HealthMap alert of Zika virus disease in Colombia. Because of this, May 31, 2015 was selected as the start date for modeling efforts using smoothed HealthMap data; April 16, 2016 (last available datum at time of manuscript preparation) was selected as the cut-off date.
Due to successful applications in other data-scarce (ie, cumulative incidence only) settings, the Incidence Decay and Exponential Adjustment (IDEA) model was used to estimate the basic reproductive number (R0) and the discount factor (d) associated with the ongoing outbreak [, , ]. Both R0 and d were solved for using nonlinear optimization to minimize the sum of squared differences (SSD) between reported (user-inputted) and modeled cumulative incidence (I) curves across multiple serial intervals (ie, outbreak generations). presents a formulation for I expressed in terms of R0 and d, where t is the number of outbreak generations (ie, serial intervals) that have passed thus far and is inversely proportional to the serial interval length (ie, number of days per serial interval [SI]). Given that distribution for the SI associated with Zika virus disease had not yet been established, R0 and d were solved for iteratively over a range of 14 deterministic lengths (10-23 days) [ ].
These values of R0 and d were then used to define maximum, minimum, and mean values for the observed reproductive number (Robs), final reported outbreak size (Imax), and final reported duration (tmax).The observed number of secondary infections per infected individual for a given value of t (Robs) was calculated using the following equation: Robs = R0/(1+ d)t.
When d is greater than zero, R0 does not equal Robs. In such circumstances, disease incidence is nonexponential due to either planned or unplanned reductions in disease duration, contact rate, or infectiousness of cases . Likewise, final reported outbreak duration (tmax) was calculated as follows [ ]: tmax≥ln(R0)/ln(1+ d).
Final reported outbreak duration can also be expressed in days by multiplying tmax by SI; however, in calculating Imax, original units (ie, outbreak generations) are used ().
In the event that a viable vaccine is developed before the ongoing outbreak in Colombia ends (tmax), the following equation was used to assess the percentage of the susceptible population that would need to be immunized against Zika virus (%Vax) to eliminate transmission, assuming 100% vaccine efficacy: %Vax=1–(1/ Robs).
After completion of the analyses on the digital surveillance data, we performed a validation study using traditional surveillance data obtained from weekly Instituto Nacional de Salud (INS) (National Institute of Health, Colombia) epidemiological bulletin publications . The INS first reported incidence of Zika virus disease in Colombia on October 16, 2015. However, subsequent publications indicated that the outbreak likely began during epidemiologic week 32 of 2015 or earlier [ ]. As result, August 22, 2015 was selected as a start date for modeling efforts using INS data. April 16, 2016 (date of the most recent publication at time of manuscript preparation) was selected as the cut-off date [ ]. The analyses described previously for the smoothed HealthMap dataset were conducted on the INS dataset as well, resulting in R0, d, Robs, Imax, tmax, and %Vax estimates for both digital (smoothed HealthMap) and traditional (INS) cumulative reported case data.
Example model fits for both digital (smoothed HealthMap; SSD=1.47×108) and traditional (INS; SSD=1.55×107) cumulative case data are shown inand (SI=17 days). In general, the traditional data model fits (mean SSD=1.76×107) were superior to those derived from digital data (mean SSD=1.64×108).
Using the digital (smoothed HealthMap) cumulative case counts, we estimated a mean R0 of 3.26 (range 1.91-5.05) and a mean d of 0.04 (range 0.01-0.07) across 14 deterministic serial interval lengths (range 10-23 days) (). We then calculated a mean Robs of 1.63 (range 1.31-2.05), a mean Imax of 85,546 cases (range 80,028-93,885 cases), and a mean tmax of 530 days (range 522-538 days; November 2016). Cumulative reported case projections using these modeled parameters are shown in .
The traditional (INS) data yielded a mean R0 of 5.36 (range 2.52-9.63) and a mean d of 0.07 (range 0.02-0.14) across 14 deterministic serial interval lengths (range 10-23 days) (). Using these, we calculated a mean Robs of 1.96 (range 1.45-2.58), a mean Imax of 77,386 cases (range 76,587-78,619 cases), and a mean tmax of 387 days (range 382-392 days; September 2016). Cumulative reported case projections using these modeled parameters are shown in .
Although R0 values calculated using the traditional (INS) data were general higher than those calculated using digital (smoothed HealthMap) cumulative case counts (SSD=82.14), Robs values were quite similar across data sources (SSD=1.84). As a result, the digital (smoothed HealthMap) and traditional (INS) cumulative case data produced similar mean %Vax values of 0.39 (range 0.24-0.51) and 0.49 (range 0.31-0.61), respectively.
When depletion of susceptible individuals due to infection (ie, via death or immunity-conferred recovery) is small relative to the total population, basic reproductive numbers obtained using the IDEA model are comparable to simple SIR-type models . Although they are especially suitable for use in data-scarce settings, SIR-type models—and, by extension, the IDEA model—cannot easily incorporate global dynamics, such as the importation and exportation of infectious agents (ie, vectors and humans) or significant climate events (ie, El Niño and La Niña). Nevertheless, others have demonstrated that simple SIR-type models perform similarly to complex mechanistic models when describing the transmission dynamics of vector-borne and water-borne diseases in localized contexts [ , ]. As a result, the IDEA model is a reasonable method for analyzing nationwide transmission dynamics of Zika virus disease in Colombia.
As defined by the IDEA modeling method, R0 represents potential transmissibility of a given pathogen in a fully susceptible, naïve population; meanwhile, Robs represents observed transmission in the face of existing interventions, as captured by d [, , ]. In this sense, the Robs is similar to the effective reproductive number (Rt), which represents transmissibility in a population that is not fully susceptible. Mean modeled estimates for R0 across both data sources were consistent with R0 estimates for Zika virus disease in French Polynesia and with R0 estimates for chikungunya and dengue [ , , ]. Mean modeled estimates for Robs were also comparable to Rt estimates for chikungunya and dengue [ , ]. To take into account the effects of ongoing transmission control efforts, Robs was used instead of R0 to calculate %Vax.
In this study, we found that using the traditional (INS) data yielded higher R0 estimates than the digital (smoothed HealthMap) cumulative reported case counts. Nevertheless, because estimates for d were also higher, modeled ranges for Robs and %Vax were comparable across both data sources. Similarly, the narrow range of possible case projections generated by the traditional (INS) data was largely encompassed by the wider range produced by the digital (smoothed HealthMap) cumulative reported case counts. Therefore, in the absence of traditional health care-based surveillance data, important epidemiologic parameters may be estimated using smoothed digital surveillance data as described here.
The methods used in this study are not without limitations. For both data sources, estimates for country-level case projections and Imax apply only to those that seek care; true caseloads are likely to be as much as five times higher than those that are reported [, ]. Furthermore, because country-level data are utilized, in-country transmission heterogeneities are not captured. As geographic granularity of digital surveillance data improves, similar analyses should be conducted at smaller scales. Nevertheless, given that projection models are designed to serve as decision-support tools, estimating the number of cases that will report to hospitals and clinics over the next several months—even at the country level—is still valuable for the purposes of resource allocation. This may be especially pertinent with respect to diagnostic support for pregnant women presenting with clinical symptoms for Zika virus disease. To date, nearly 20% of all reported Zika virus disease cases in Colombia have been pregnant women; if the current rate holds, thousands more may be infected and seek care before the outbreak ends. However, the projections presented in this paper only apply in the event that circumstances remain unchanged (eg, no new interventions are put in place).
With improved compliance, vector suppression interventions (eg, elimination of standing water, exhaustive use of insect repellant) have the potential to bring this outbreak to a swift close, even in the absence of a vaccine. In the event that a viable vaccine can be developed before the outbreak ends, our estimates suggest that approximately half of the susceptible population would need to be immunized to confer herd immunity. Considering the growing body of evidence linking Zika virus infection during pregnancy to microcephaly in newborn babies, women of childbearing age should be given priority if the option becomes available [, ].
Regardless of whether a vaccine reaches the market before the outbreak in Colombia ends, the data acquisition and modeling approach presented in this paper may still benefit other Zika-affected countries with limited capacity for government-implemented health care-based data collection. Although traditional surveillance data should be used preferentially, in its absence digital surveillance data can yield comparable estimates for key transmission parameters. It has been shown that digital surveillance data can be used retrospectively to assess transmission dynamics of well-understood pathogens (eg, Vibrio cholerae); however, our findings suggest that similar analyses can also be conducted in near real time for emerging infectious diseases . Moreover, the epidemiologic parameters estimates from these analyses may be readily updated as new information emerges, enabling prospective tracking of transmission dynamics at the country level despite data scarcity.
Recent history has shown the need for rapid epidemiologic assessments to better inform intervention strategies in the face of a public health emergency. For effective evaluation of such interventions, baseline estimates for transmissibility—like those described in this study—must be established. Furthermore, changes in outbreak dynamics must be closely monitored in order to assess the impact of active interventions on disease transmission. Our approach offers an important alternative to guesswork based loosely on related diseases and previous outbreaks. Given the absence of traditional surveillance data and transmission heterogeneities across Central and South America, digital surveillance data can and should be used to conduct similar analyses for other Zika-affected countries in the months ahead.
The authors would like to thank Dr Edgar Diaz of University of California, San Diego, and Mr Flavio Enrique Garzon Romero of the INS for their guidance regarding data sources. This work was supported by the National Library of Medicine of the National Institutes of Health (R01LM010812) and the US Agency for International Development (USAID) Emerging Pandemic Threats 2 (EPT2) PREDICT2 program. The funders had no role in study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the paper for publication.
Conflicts of Interest
- Brownstein JS, Freifeld CC, Madoff LC. Digital disease detection--harnessing the Web for public health surveillance. N Engl J Med 2009 May 21;360(21):2153-2157 [FREE Full text] [CrossRef] [Medline]
- Majumder MS, Kluberg S, Santillana M, Mekaru S, Brownstein JS. 2014 Ebola outbreak: media events track changes in observed reproductive number. PLoS Curr 2015;7:e1 [FREE Full text] [CrossRef] [Medline]
- Chunara R, Andrews JR, Brownstein JS. Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak. Am J Trop Med Hyg 2012 Jan;86(1):39-45 [FREE Full text] [CrossRef] [Medline]
- European Centre for Disease Prevention and Control. Rapid Risk Assessment: Zika Virus Epidemic in the Americas: Potential Association with Microcephaly and Guillain-Barré Syndrome. 2015 Dec 10. URL: http://ecdc.europa.eu/en/publications/Publications/zika-virus-americas-association-with-microcephaly-rapid-risk-assessment.pdf [WebCite Cache]
- World Health Organization. WHO Director-General summarizes the outcome of the Emergency Committee regarding clusters of microcephaly and Guillain-Barré syndrome URL: http://www.who.int/mediacentre/news/statements/2016/emergency-committee-zika-microcephaly/en/ [WebCite Cache]
- Besnard M, Lastere S, Teissier A, Cao-Lormeau V, Musso D. Evidence of perinatal transmission of Zika virus, French Polynesia, December 2013 and February 2014. Euro Surveill 2014;19(13) [FREE Full text] [Medline]
- Musso D, Roche C, Robin E, Nhan T, Teissier A, Cao-Lormeau V. Potential sexual transmission of Zika virus. Emerg Infect Dis 2015 Feb;21(2):359-361 [FREE Full text] [CrossRef] [Medline]
- Faria NR, Azevedo RS, Kraemer MU, Souza R, Cunha MS, Hill SC, et al. Zika virus in the Americas: early epidemiological and genetic findings. Science 2016 Apr 15;352(6283):345-349. [CrossRef] [Medline]
- Pan American Health Organization/World Health Organization. Suspected and confirmed Zika cases reported by countries and territories in the Americas, 2015-2016 URL: http://ais.paho.org/phip/viz/ed_zika_epicurve.asp [WebCite Cache]
- Kaplan E. Time. 2016 Apr 29. Hard-hit Colombia could be the key to understanding the Zika virus URL: http://time.com/4312555/colombia-zika-virus-cdc/ [WebCite Cache]
- Duffy MR, Chen T, Hancock WT, Powers AM, Kool JL, Lanciotti RS, et al. Zika virus outbreak on Yap Island, Federated States of Micronesia. N Engl J Med 2009 Jun 11;360(24):2536-2543 [FREE Full text] [CrossRef] [Medline]
- Duffy MR, Chen T, Hancock WT, Powers AM, Kool JL, Lanciotti RS, et al. Zika virus outbreak on Yap Island, Federated States of Micronesia. N Engl J Med 2009 Jun 11;360(24):2536-2543 [FREE Full text] [CrossRef] [Medline]
- World Health Organization. 2016 Feb 17. Mosquito control: can it stop Zika at source? URL: http://www.who.int/emergencies/zika-virus/articles/mosquito-control/en/ [WebCite Cache]
- HealthMap. Zika virus URL: http://www.healthmap.org/zika/ [WebCite Cache]
- Chan EH, Sahai V, Conrad C, Brownstein JS. Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance. PLoS Negl Trop Dis 2011 May;5(5):e1206 [FREE Full text] [CrossRef] [Medline]
- Yang S, Santillana M, Kou SC. Accurate estimation of influenza epidemics using Google search data via ARGO. Proc Natl Acad Sci U S A 2015 Nov 24;112(47):14473-14478. [CrossRef] [Medline]
- Google Trends. URL: https://www.google.com/trends/ [WebCite Cache]
- Fisman DN, Hauck TS, Tuite AR, Greer AL. An IDEA for short term outbreak projection: nearcasting using the basic reproduction number. PLoS One 2013;8(12):e83622 [FREE Full text] [CrossRef] [Medline]
- Santillana M, Tuite A, Nasserie T, Fine P, Champredon D, Chindelevitch L, et al. Relatedness of the Incidence Decay with Exponential Adjustment (IDEA) Model, “Farr's Law” and Compartmental Difference Equation SIR Models. arXiv 2016;1603.01134.
- Majumder MS, Cohn E, Fish D, Brownstein JS. Estimating a feasible serial interval range for Zika fever. Bull World Health Organ 2016;10.2471/BLT.16.171009. [CrossRef]
- Instituto Nacional de Salud. Boletín Epidemiológico URL: http://www.ins.gov.co/boletin-epidemiologico/Paginas/default.aspx [WebCite Cache]
- Instituto Nacional de Salud. Semana epidemiológica número 07 de 2016 URL: http://www.ins.gov.co/boletin-epidemiologico/Boletn%20Epidemiolgico/2016%20Bolet%C3%ADn%20epidemiol%C3%B3gico%20semana%2014.pdf [WebCite Cache]
- Johansson MA, Hombach J, Cummings DA. Models of the impact of dengue vaccines: a review of current research and potential approaches. Vaccine 2011 Aug 11;29(35):5860-5868 [FREE Full text] [CrossRef] [Medline]
- Grad YH, Miller JC, Lipsitch M. Cholera modeling: challenges to quantitative analysis and predicting the impact of interventions. Epidemiology 2012 Jul;23(4):523-530 [FREE Full text] [CrossRef] [Medline]
- Yakob L, Clements AC. A mathematical model of chikungunya dynamics and control: the major epidemic on Réunion Island. PLoS One 2013;8(3):e57448 [FREE Full text] [CrossRef] [Medline]
- Chowell G, Diaz-Dueñas P, Miller JC, Alcazar-Velazco A, Hyman JM, Fenimore PW, et al. Estimation of the reproduction number of dengue fever from spatial epidemic data. Math Biosci 2007 Aug;208(2):571-589. [CrossRef] [Medline]
- Poletti P, Messeri G, Ajelli M, Vallorani R, Rizzo C, Merler S. Transmission potential of chikungunya virus and control measures: the case of Italy. PLoS One 2011;6(5):e18860 [FREE Full text] [CrossRef] [Medline]
- Pinho ST, Ferreira CP, Esteva L, Barreto FR, Morato e Silva VC, Teixeira MG. Modelling the dynamics of dengue real epidemics. Philos Trans A Math Phys Eng Sci 2010 Dec 28;368(1933):5679-5693 [FREE Full text] [CrossRef] [Medline]
|%Vax: percentage of the susceptible population that would need to be immunized to eliminate transmission|
|d: discount factor|
|IDEA: Incidence Decay and Exponential Adjustment|
|I: cumulative incidence|
|Imax: final reported outbreak size|
|INS: Instituto Nacional de Salud|
|MERS: Middle East respiratory syndrome|
|R0: basic reproductive number|
|Robs: observed reproductive number|
|Rt: effective reproductive number|
|SARS: severe acute respiratory syndrome|
|SI: serial interval length|
|SSD: sum of squared differences|
|tmax: final reported outbreak duration|
Edited by G Eysenbach; submitted 28.03.16; peer-reviewed by Z Zhang, D Paolotti; comments to author 27.04.16; revised version received 11.05.16; accepted 11.05.16; published 01.06.16
©Maimuna S Majumder, Mauricio Santillana, Sumiko R Mekaru, Denise P McGinnis, Kamran Khan, John S Brownstein. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 01.06.2016.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.