This is an openaccess article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included.
Monitoring disease incidence rates over time with population surveillance data is fundamental to public health research and practice. Bayesian disease monitoring methods provide advantages over conventional methods including greater flexibility in model specification and the ability to conduct formal inference on modelderived quantities of interest. However, software platforms for Bayesian inference are often inaccessible to nonspecialists.
To increase the accessibility of Bayesian methods among health surveillance researchers, we introduce a Bayesian methodology and open source software package, surveil, for timeseries modeling of disease incidence and mortality. Given case count and populationatrisk data, the software enables health researchers to draw inferences about underlying risk and derivative quantities including agestandardized rates, annual and cumulative percent change, and measures of inequality.
We specify a Poisson likelihood for case counts and model trends in logrisk using the firstdifference (randomwalk) prior. Models in the surveil R package were built using the Stan modeling language. We demonstrate the methodology and software by analyzing agestandardized colorectal cancer (CRC) incidence rates by race and ethnicity for nonLatino Black (Black), nonLatino White (White), and Hispanic/Latino (of any race) adults aged 5079 years in Texas’s 4 largest metropolitan statistical areas between 1999 and 2018.
Our analysis revealed a cumulative decline of 31% (95% CI –37% to –25%) in CRC risk among Black adults, 17% (95% CI –23% to –11%) for Latino adults, and 35% (95% CI –38% to –31%) for White adults from 1999 to 2018. None of the 3 observed groups experienced significant incidence reduction in the final 4 years of the study (20152018). The BlackWhite rate difference (per 100,000) was 44 (95% CI 3057) in 1999 and 35 (95% CI 2843) in 2018. Cumulatively, the BlackWhite gap accounts for 3983 CRC cases (95% CI 37464219) or 31% (95% CI 29%32%) of total CRC incidence among Black adults in this period.
Stalled progress on CRC prevention and excess CRC risk among Black residents warrant special attention as cancer prevention and control priorities in urban Texas. Our methodology and software can help the public and health agencies monitor health inequalities and evaluate progress toward disease prevention goals. Advantages of the methodology over current common practice include the following: (1) the absence of piecewise linearity constraints on the model space, and (2) formal inference can be undertaken on any modelderived quantities of interest using Bayesian methods.
Monitoring disease incidence rates is fundamental to public health research and practice. Vital statistics systems, cancer registries, and other diseasespecific monitoring programs provide critical data resources for public health research, and valid interpretation of these data requires formal modeling.
Joinpoint regression modeling (JRM) is a commonly employed, National Cancer Institute–endorsed method for monitoring incidence and mortality rates [
We present a Bayesian methodology and open source software package for routine disease surveillance. The models are appropriate for timeseries count data aggregated across evenly spaced time periods. The models assign the Poisson likelihood to observed counts conditional on unknown risk; time trends in risk are modeled by assigning the firstdifference (randomwalk) prior distribution to the logrates. Binomial models for nonrare events are also implemented. Strengths of the method include its parsimony, the absence of linearity constraints, and the use of Bayesian inference [
We demonstrate use of the surveil R package by analyzing urban colorectal cancer (CRC) incidence in Texas. “Eliminating cancer disparities” is purportedly a “crosscutting aim” of the Cancer Prevention and Research Institute of Texas’s (CPRIT’s)
The surveil R package implements Poisson randomwalk models. For time period
Alternatively, the binomial likelihood may be used:
where g^{1}(x)=exp(x)/(1+exp(x)) is the inverselogit function.
We assign the firstdifference (randomwalk) model to the log or logittransformed risk parameters, consistent with our knowledge that disease risk tends to vary smoothly over time:
This and related intrinsic Gaussian Markov random field specifications are extensively studied models for time trend analyses [
By default, surveil prior distributions are diffuse for most applications, and users can adjust them to match their subject matter knowledge. The log or logittransformed risk for
It is centered on a rate of e^{5}=673 per 100,000 and spreads the prior probability across a wide range of values. Changes in logrates are small, such that surveil’s following default prior is also diffuse:
This base model specification may be extended for multiple correlated time series, such as observations of multiple demographic groups. If
introduces a covariance structure through the multivariate normal distribution [
The models were built in Stan, a stateoftheart platform for Bayesian inference with Markov chain Monte Carlo (MCMC) [
The surveil R package is freely available and archived on the Comprehensive R Archive Network. Basic use of the software requires only introductorylevel R programming skills. Tables downloaded from the CDC Wonder database are automatically in the expected format. The modelfitting function, stan_rw, returns a summary of results (estimates with 95% CIs) and Markov chain Monte Carlo (MCMC) samples.
The package supports a streamlined workflow for analyzing disease incidence data. It produces publicationquality visualizations using ggplot2 [
Using MCMC, probability statements can be made about any quantity of interest that is derived from model parameters [
When working with agestandardized rates, excess cases (ECs) must be calculated separately for each age stratum and then summed across age groups (
Rate ratio (RR)=R_{d}/R_{a}, where R is the incidence rate, and subscripts “a” and “d” represent the advantaged and disadvantaged demographic groups, respectively.
Rate difference (RD)=R_{d}–R_{a}
Proportion attributable risk (PAR)=RD/R_{d}
Excess cases (EC)=RD×P_{d}, where P represents the populations at risk.
Cumulative EC=Σ_{t} EC_{t}, where the subscript “t” represents the time period.
Cumulative PAR=Σ_{t} EC_{t}/Σ_{t} (R_{dt}×P_{dt})
Rate ratio (RR)=SR_{d}/SR_{a}, where “SR” is the agestandardized incidence rate, and subscripts “a” and “d” represent the advantaged and disadvantaged demographic groups, respectively.
Rate difference=SR_{d}–SR_{a}
Excess cases (EC)=Σ_{i} (R_{di}–R_{ai})×P_{di}, where “P” represents the populations at risk, and subscript “i” represents the age groups.
Proportion attributable risk (PAR)=EC/Σ_{i}R_{di}×P_{di}
Cumulative EC=Σ_{t} EC_{t}, where “t” represents the time periods.
Cumulative PAR=Σ_{t} EC_{t}/Σ_{t} Σ_{i} (R_{dit}×P_{dit})
We gathered publicly available agespecific (5079 years) data on CRC incidence and population at risk, between 1999 and 2018, by race and ethnicity in the 4 largest metropolitan statistical areas (MSAs) in Texas (centered in Austin, Dallas, Houston, and San Antonio). Uncensored data for this age range are publicly available at the level of 5year age groups for Hispanic/Latino (all racial groups combined), nonLatino Black or African American (Black), and nonLatino White (White) populations. CRC data for Asian Pacific Islanders are not available for 5year age groups but are available for the aggregate 5079–yearold population. Data for American Indians/Alaska Natives are not available [
We modeled CRC incidence by raceethnicity and 5year age group for the 4 MSAs combined using surveil’s Poisson firstdifference models. We calculated agestandardized rates using direct agestandardization and the 2000 US standard million population [
We examined rates of change by calculating the average annual percent change (AAPC) per 4year period. The sole purpose of aggregating to 4year periods is to stabilize the estimates. We measure BlackWhite inequality by the rate difference (RD), PAR, and ECs. Probability distributions for all quantities of interest were obtained using MCMC analysis. For each model, we drew 6000 samples from each of 4 MCMC chains, discarding the first 3000 samples of each chain as warmup. Before analyzing the results, we confirm that MCMC samples converge on a single distribution using the split Rhat statistic and that MCMC SEs are sufficiently small [
CRC incidence declined substantially between 1999 and 2018 (
Agestandardized incidence rates of colorectal cancer (CRC) per 100,000 by raceethnicity among adults aged 5079 years between 1999 and 2018 in 4 Texas metropolitan statistical areas.
Levels and cumulative percent change of agestandardized risk of colorectal cancer (CRC) per 100,000 among adults aged 5079 years, in Texas’s 4 largest metropolitan statistical areas between 1999 and 2018.

Agestandardized CRC risk in 1999, risk (95% CI)  Agestandardized CRC risk in 2018, risk (95% CI)  Percent (%) change (95% CI) 
Black  188 (176 to 201)  129 (123 to 136)  –31 (–37 to –25) 
Latino  116 (109 to 123)  96 (92 to 100)  –17 (–23 to –11) 
White  144 (140 to 150)  94 (91 to 98)  –35 (–38 to –31) 
Levels and cumulative percent change of agespecific risk of colorectal cancer (CRC) per 100,000 among adults aged 5079 years (not agestandardized), in Texas’s 4 largest metropolitan statistical areas between 1999 and 2018.

Non–agestandardized CRC risk in 1999, risk (95% CI)  Non–agestandardized CRC risk in 2018, risk (95% CI)  Percent (%) change (95% CI) 
Asian Pacific Islander  75 (66 to 88)  67 (61 to 73)  –11 (–25 to 3) 
Black  170 (160 to 182)  122 (115 to 128)  –28 (–34 to –22) 
Latino  103 (97 to 109)  86 (83 to 90)  –16 (–22 to –9) 
White  135 (130 to 140)  95 (91 to 98)  –30 (–34 to –26) 
AAPC by 4year period shows that the most rapid progress on CRC prevention was achieved (roughly) between 2003 and 2014, and that progress appears to have stalled since then (
By multiple measures, aggregate BlackWhite inequality increased between 1999 and 2008 and then decreased or stabilized by 2018 (
Average annual percent change (AAPC) in agestandardized incidence rates of colorectal cancer (CRC) by 4year period between 1999 and 2018.
BlackWhite inequality in the incidence rates of colorectal cancer between 1999 and 2018: rate difference per 100,000, proportion attributable risk, and excess cases.
Monitoring disease incidence is a crucial public health task. The ubiquitous JRM method has notable shortcomings, including linearity constraints and overconfident SEs. This paper presents a parsimonious methodology grounded in Bayesian timeseries analysis and accessible through the surveil R package. The package also returns probability distributions for annual and cumulative percent change, measures of pairwise inequality, and the Theil inequality index. Using standard MCMC analysis techniques, users may also conduct inference on any userdefined quantity of interest that is a function of model parameters, such as the AAPC. This project aims to make Bayesian analysis accessible to a wider range of researchers while making robust analyses of health inequality integral to surveillance research. The Poisson models discussed here are appropriate for “rare” events (generally, rates of <0.04). Binomial models for nonrare events are also implemented in surveil. The models are designed for the analysis of data from highquality surveillance or vital statistics systems that have been aggregated across evenly spaced time periods.
Between 1999 and 2013, robust CRC risk reduction occurred for White and Black residents, the highestrisk racialethnic groups for which data are publicly available, while more modest progress was achieved for Latino and Asian Pacific Islander populations. Excess CRC risk among Black adults is the most burdensome and urgent health inequality identified in this analysis. BlackWhite inequality increased in relative terms before falling toward its previous level, while annual excess cases increased by approximately 190%. From 2015 to 2018, none of the observed groups experienced any substantial progress in terms of CRC risk reduction.
CRC screening by colonoscopy can prevent CRC through the removal of precancerous polyps [
Given claims that racial segregation is a driver of BlackWhite cancer inequalities [
Major limitations of this analysis include the absence of data by social class or income, aggregation of data across distinct MSAs, exclusion of the El Paso metropolitan area, and exclusive focus on the highestrisk age groups.
Public accountability for public health goals requires routine monitoring of health outcomes and inequalities. surveil can help health agencies and the public in defining goals and monitoring outcomes. Our analysis of CRC incidence in 4 Texas MSAs finds that prevention progress has stalled and that little to no progress on BlackWhite CRC inequality was achieved from 1999 to 2018. Texans have voted twice—first in 2007, and again in 2019—to establish and fund CPRIT, making cancer prevention a public priority. CPRIT recently identified ending cancer disparities as a priority [
average annual percent change
Citywide Colon Cancer Control Coalition
Cancer Prevention and Research Institute of Texas
colorectal cancer
excess case
Joinpoint regression modeling
Markov chain Monte Carlo
metropolitan statistical area
proportion of attributable risk
rate difference
rate ratio
This research was supported by the Texas Health Resources Clinical Scholars Program and Cancer Prevention and Research Institute of Texas (CPRIT PP180018).
None declared.