This is an openaccess article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included.
Surveillance data are essential public health resources for guiding policy and allocation of human and capital resources. These data often consist of large collections of information based on nonrandom sample designs. Population estimates based on such data may be impacted by the underlying sample distribution compared to the true population of interest. In this study, we simulate a population of interest and allow response rates to vary in nonrandom ways to illustrate and measure the effect this has on populationbased estimates of an important public health policy outcome.
The aim of this study was to illustrate the effect of nonrandom missingness on populationbased survey sample estimation.
We simulated a population of respondents answering a survey question about their satisfaction with their community’s policy regarding vaccination mandates for government personnel. We allowed response rates to differ between the generally satisfied and dissatisfied and considered the effect of common efforts to control for potential bias such as sampling weights, sample size inflation, and hypothesis tests for determining missingness at random. We compared these conditions via mean squared errors and sampling variability to characterize the bias in estimation arising under these different approaches.
Sample estimates present clear and quantifiable bias, even in the most favorable response profile. On a 5point Likert scale, nonrandom missingness resulted in errors averaging to almost a full point away from the truth. Efforts to mitigate bias through sample size inflation and sampling weights have negligible effects on the overall results. Additionally, hypothesis testing for departures from random missingness rarely detect the nonrandom missingness across the widest range of response profiles considered.
Our results suggest that assuming surveillance data are missing at random during analysis could provide estimates that are widely different from what we might see in the whole population. Policy decisions based on such potentially biased estimates could be devastating in terms of community disengagement and health disparities. Alternative approaches to analysis that move away from broad generalization of a mismeasured population at risk are necessary to identify the marginalized groups, where overall response may be very different from those observed in measured respondents.
The emergence of COVID19 in 2019 has given rise to numerous challenges in global health. Many of those challenges have been easily observable and measurable. The intervening months produced countless publications on social distancing and vaccination measures and their resulting effects on the spread of the infection. Even now, epidemiological papers provide current updates on the disease’s differential impact in highrisk populations compared to susceptible people whose risk may not be as high. Most of these analyses were conducted quickly, using available but incomplete data to provide rapid assessments. A challenge that has not been explored in as much detail is how the analysis of incomplete data without proper adjustments may be producing biased results that can lead to detrimental effects as we try to measure knowledge, attitudes, and behaviors related to various aspects of COVID19.
Public health surveillance data are useful for noninvasively monitoring community health [
Public health surveillance can be used to address a host of epidemiological questions at a micro level, drilling down to community clusters to identify the who, where, and when of disease concentration. A problem arises when the analyst tries to scale the analysis to the macro level when a nonrandom sample of individuals is used to try to draw inference to a population that the data cannot and do not accurately represent [
Many statistical methods for dealing with missing data require that the data be missing at random (MAR). Investigators turn to methods like those presented in Cohen and Cohen [
An additional approach favored by investigators interested in surveillance involves expanding the sample size through the addition of observations, widening eligibility criteria, or adding additional questions onto an existing largescale questionnaire [eg, 8,11]. In the case of publicuse data sets and surveillance systems, there is often an abundance of observations available for analysis. Extremely large sample sizes are considered to be rich data sources and provide an excellent opportunity to “find something.” Nonprobability samples designed to maximize the number of respondents may present analysts with a wealth of data, but the impact of nonrandom missingness may limit the value of inference drawn from such studies. Although numerous examples of “spam the list” samples and imperfect censuses exist in the literature, we prefer to focus on the statistical impact of such methods rather than calling out our colleagues and peers in this paper for using such methods [eg, 9,11].
Applications of public health surveillance often focus on the data at hand rather than general principles of analytic performance in the presence of nonrandom missingness. In the sections below, we use simulation to explore and illustrate the impact of nonrandom missingness on a single survey item. Our approach allows us to investigate and quantify the error in the estimation of a mean when the randomness of the missingness varies from semicomplete to not complete at all. We also provide an illustration of how increasing the sample size impacts an estimator when the data are not MAR. Finally, we present the results of Cohen and Cohen’s approach [
A more detailed description of our methods could be seen in
We induce missingness in the data via a uniform random value for each respondent. In our simulation, we compare the effects of data missing completely at random (MCAR) to not missing at random (NMAR) data, where the missingness is not random. We define mechanism as the reason for the data’s missingness, as per the study by Little and Rubin [
Our simulation replicates 1000 random samples of our overall population and assigns observed values in the sample. Sampling weights [
We quantify the effect of missingness and weighting with the mean squared error (MSE) [
We present summary results for the following three population conditions:
Uniform response across categories (ie, no response is more likely than other).
Generally satisfied respondents in the population (ie, two satisfied responses are more likely than unsatisfied responses).
Generally dissatisfied respondents in the population (ie, two dissatisfied responses are more likely than satisfied responses).
Under these conditions, we presented a constant response rate of 90% for the generally satisfied respondents (response of three or higher on the question) and allowed the missingness to vary from 10% to 90% for the dissatisfied respondents to explore the impact of nonrandom missingness. We also compared results for two sample sizes (800 and 8000) to see how this affects the estimators’ behavior. A sample of 800 was chosen for a margin of error of approximately 3.5% for estimating the percentage of those satisfied with the community’s vaccine mandates for government employees and civil servants. The sample size of 8000 was arbitrarily chosen as an inflation by a factor of 10 without specific statistical justification. The simulation was written in SAS 9.4 (Cary, NC). Refer to
No human subjects were involved in this simulation, so no institutional review board approval was necessary.
We used a uniform response pattern to describe a community without a particularly strong opinion about their government’s efforts toward a vaccine mandate. Our response rates are assigned using a hypothetical convention that people who are generally supportive of public health practices will be inclined to respond to the survey and share their positive opinions, whereas people who are unhappy with the current state of affairs will decline (at a range of levels) to talk about their concerns with a stranger. We hold the response rate constant at 90% for the satisfied members of the community, suggesting their willingness to participate in the survey. We consider scenarios wherein the response rate in the dissatisfied group becomes progressively worse to measure impacts of this differential response on sampling variability, MSE, and bias, in terms of points on a 5point Likert scale. We also report results after calculating weighted means in an attempt to adjust for nonresponse for samples from this community.
The first row in
Interestingly, our results remained consistent when we expanded the sample size (
Mean squared error (MSE), sampling variance, and bias by sample size and response pattern.
Number of samples out of 1000, where Cohen and Cohen’s approach [
Dissatisfied nonresponse rate  Uniform, n  Generally satisfied, n  Generally dissatisfied, n  

Race  Sex  Both  Race  Sex  Both  Race  Sex  Both  
10%  50  43  3  41  54  0  49  55  3  
20%  46  52  4  52  53  5  59  49  2  
30%  62  46  3  55  55  4  52  57  4  
40%  45  48  3  54  69  3  57  61  6  
50%  42  48  0  52  41  3  37  47  1  
60%  51  37  1  43  40  1  46  59  1  
70%  55  42  5  46  59  2  56  52  3  
80%  53  47  1  50  63  4  51  61  2  
90%  49  38  3  70  53  2  57  57  3 
Number of samples out of 1000, where Cohen and Cohen’s approach [
Dissatisfied nonresponse rate  Uniform, n  Generally satisfied, n  Generally dissatisfied, n  

Race  Sex  Both  Race  Sex  Both  Race  Sex  Both  
10%  34  38  1  43  44  1  43  52  2  
20%  36  50  2  34  37  3  32  39  1  
30%  35  43  2  39  37  3  36  43  2  
40%  42  40  1  46  43  2  37  52  6  
50%  34  49  0  45  58  3  42  51  1  
60%  46  43  4  40  57  3  50  37  1  
70%  38  50  2  53  44  5  48  36  0  
80%  42  29  1  51  43  2  49  50  2  
90%  29  31  2  60  60  3  46  51  1 
When the simulated respondents were generally satisfied, we observed less overall missingness in the data, even as the nonresponse rate for dissatisfied respondents increased. The second row in
The third row of
Our simulations illustrate the impact nonrandom missing data can have on populationbased estimates even when analyzing a fairly simple survey sample. What we present in our examples indicate that basic diagnostic tests of missingness at random or the use of sampling weights do not automatically control for such biases and are not simple guarantees or workarounds to improve the quality of estimates.
Statistical discussions of missingness tend to focus on reducing nonresponse in the survey implementation [
In surveillance data, particularly in a public health crisis, where data are needed quickly, existing surveys often are repurposed for additional data collection, or analysts include convenient data of unknown (if any) design. In this repurposed use (eg, through the addition of COVID19 questions to ongoing surveys), we may well expect new (and unknown) patterns of missingness. Adjusting for design alone (via the designbased weights based on designed probability of selection but not necessarily probability of response) can adapt estimates for the anticipated design; however as seen above, important impacts of new causes of missingness will be missed. Specifically, the examples in our study illustrate how dissonance between sampling weights (adjusting for the probability of “selection”) and patterns of missingness (which changes the probability of “response”) can result in bias. Such impacts can be mitigated if the missingness occurs in subpopulations with low weights (as in our generally satisfied population example) but can be inflated if the missingness occurs in subpopulations receiving high sampling weights (as in our generally dissatisfied population example). Unless we know both the probability of selection and the probability of response, we cannot see the full picture and cannot adjust estimates appropriately with traditional reweighting methods.
As our simple example illustrates, the application of design weights should not be considered as a panacea for the challenges of extending survey designs for surveillance purposes. A closer look at
A larger problem comes from the intention to use surveillance data to make global statements about a community. Extrapolation is often mentioned as a concern in modeling but rarely translated to estimators drawn from a nonrandom sample inferring parameters of a larger population. Our simulator shows that as the response rate becomes increasingly worse in a population subgroup, the sample’s effectiveness at representing the larger community abates, in many cases rather drastically. Using data from a sample with unknown probabilities of observation, particularly survey data where the data may not be MAR, is a clear example of extrapolation. Ultimately, a failure to adequately represent a marginalized population may lead to political and social unrest. Policy decisions based on such data could result in creating or widening disparities already detrimental to social justice and health equity outcomes.
The simulations we showed in our study illustrate that estimates from survey samples have the potential to be heavily biased when extended beyond their design, especially in the presence of differential missingness due to imbalanced probability of response. Although the potential of such bias is known in theory, our simulations provide a basic but practical illustration of the potential magnitude of the problem. We note that these simulations represent a simplified (but perhaps not rare) illustration of the problem; the direction and magnitude of the bias can and likely will vary considerably as the relationship missingness shares with the survey changes. We contend that surveys with missing data will rarely (if ever) be random to some extent in surveillance settings and recommend considerable caution when applying survey weights based on sampling plans alone without consideration of potential differential missingness. In particular, we recommend that thoughtful summaries of potential biases accompany analyses and interpretation, especially those drawing from multiple available data sources. We recommend that, rather than using the survey data to look upward to the community, analysts should be encouraged to consider looking down to the observed population, instead.
Although it makes analytical sense to assume the data are MAR, this decision may come with a sizable cost. If we assume missingness at random in error, we reach conclusions that are far from the truth and could lead to devastating social consequences. If we assume that the missingness is not random in error, we make more cautious conclusions and open avenues to better identify and understand potentially underserved sections of our population of interest. Erring on the side of nonrandom missingness leads to a more socially responsible analysis of all the available information.
One limitation of our study is that we applied simple random sampling to simulate the survey experience where most surveillance data sets are multistage cluster designs. We note that a more complex design typically would likely lead to an inflation in the sampling variability but would not reduce the mean squared error or bias inherent in the differential response. In our examples, we also only considered three response patterns and arbitrarily assigned the population responses based on our own characterization of stronger satisfaction and dissatisfaction propensity. Our simulator is available to readers (See
Surveillance is an essential component of public health practice. Surveillance data allow us to produce useful descriptive measures to characterize the movement of disease through a population at risk. The current pandemic has produced vast amounts of data, much of which come from nonrandom samples or are drawn from surveys where data missingness patterns can obscure original sampling plans to the point that traditional sampling weights alone cannot provide proper adjustments to estimates. Our examples suggest an opportunity to develop new methods that move away from classical designonly approaches and move toward methods that explore designs for data collection and adjustment for patterns in data completeness that let us use the information more effectively for making better public health decisions for the entire population.
Full description of methods.
SAS Simulation Macro.
mean squared error
missing at random
missing completely at random
not missing at random
None declared.