This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included.
Adding additional bicycle and pedestrian paths to an area can lead to improved health outcomes for residents over time. However, quantitatively determining which areas benefit more from bicycle and pedestrian paths, how many miles of bicycle and pedestrian paths are needed, and the health outcomes that may be most improved remain open questions.
Our work provides and evaluates a methodology that offers actionable insight for city-level planners, public health officials, and decision makers tasked with the question “To what extent will adding specified bicycle and pedestrian path mileage to a census tract improve residents’ health outcomes over time?”
We conducted a factor analysis of data from the American Community Survey, Center for Disease Control 500 Cities project, Strava, and bicycle and pedestrian path location and use data from two different cities (Norfolk, Virginia, and San Francisco, California). We constructed 2 city-specific factor models and used an algorithm to predict the expected mean improvement that a specified number of bicycle and pedestrian path miles contributes to the identified health outcomes.
We show that given a factor model constructed from data from 2011 to 2015, the number of additional bicycle and pedestrian path miles in 2016, and a specific census tract, our models forecast health outcome improvements in 2020 more accurately than 2 alternative approaches for both Norfolk, Virginia, and San Francisco, California. Furthermore, for each city, we show that the additional accuracy is a statistically significant improvement (
We propose and evaluate a methodology to enable decision makers to weigh the extent to which 2 bicycle and pedestrian paths of equal cost, which were proposed in different census tracts, improve residents’ health outcomes; identify areas where bicycle and pedestrian paths are unlikely to be effective interventions and other strategies should be used; and quantify the minimum amount of additional bicycle path miles needed to maximize health outcome improvements. Our methodology shows statistically significant improvements, compared with alternative approaches, in historical accuracy for 2 large cities (for 2016) within different geographic areas and with different demographics.
The addition of bicycle and pedestrian paths to an area is a theoretically valuable resource for city-level planners, public health officials, and decision makers to increase physical activity and improve health outcomes. Most existing research has found a negative association between the prevalence of bicycle and pedestrian paths and poor health outcomes (ie, diabetes, stroke, obesity, heart disease, high blood pressure, and ailments to physical and mental health) [
Our objective is to provide and evaluate a methodology for officials addressing the question “To what extent will adding specified bicycle and pedestrian path mileage to a census tract improve residents’ health outcomes over time?” The methodology we propose uses factor analysis to filter and organize variables from publicly available data sets at the census tract level within a given city. The data sets included (1) the US Census [
The result of this analysis is a city-specific factor model describing the relationship among variables related to individuals, bicycling and walking behaviors, and health outcomes. Then, the factor model, built using past data, is used in an algorithm to predict the extent to which adding a future specified number of bicycle and pedestrian path miles to a certain location in the city quantitatively impacts certain health outcomes.
We are not aware of any other applications of factor analysis to develop predictive algorithms related to the placement and efficacy of bicycle and pedestrian paths with respect to health outcomes. However, there are researchers who approach bicycle and pedestrian path planning from a similar perspective. Smith and Haghani [
Although they are not prevalent in identifying bicycle and pedestrian path placement, optimization techniques have also been explored for choosing existing routes rather than developing new ones. Allen-Munley et al [
Ospina et al [
It is important to note that there are arguments against defining the placement of bicycle and pedestrian paths as a systems engineering problem. Szimba and Rothengatter [
Furthermore, significant work has been conducted to estimate demand [
In summary, the algorithm used in this study is unique from previous approaches used for estimating demand, evaluating network efficacy, and optimizing the placement of bicycle and pedestrian paths. The problem examined here focuses on understanding what health outcomes can be improved by adding bicycle and pedestrian paths, in which census tracts will adding bicycle and pedestrian paths improve health outcomes the most, and finally, how many miles of bicycle and pedestrian paths within a given census tract need to be added to have an impact on the residents’ health outcomes.
The remainder of this paper is organized as follows. First, we review the data and methods used in our approach to construct city-specific models. Next, we apply the approach to two different cities: Norfolk, Virginia, and San Francisco, California. We then evaluate our approach for the 2 different cities. In the evaluation, our approach was tested against 2 alternate approaches for predicting improvements in health outcomes by adding bicycle and pedestrian paths. The evaluation shows that our approach offers more accurate predictions than both alternatives and that the superior difference in accuracy is statistically significant (
Our work uses publicly-available data related to urban infrastructure and resident demographics and health outcomes. The data sets reflect aggregate variables measured at the census tract level of a city and do not contain any personally identifiable information. Therefore, they do not involve human subjects as defined by federal regulations and their use does not require ethics board review or approval [
Our approach to modeling the health effects of adding bicycle and pedestrian paths at the census tract level uses data from (1) census tract boundaries used in the US Census [
An overview of the data sets and other supplementary materials supplied in the multimedia appendices. ACS: American Communities Survey; BPP: bicycle and pedestrian path; CDC: Centers for Disease Control and Prevention; NOR: Norfolk; SF: San Francisco; SME: subject matter expert.
Census tracts are small, contiguous, and relatively permanent statistical subdivisions of a county or an equivalent entity. The populations in census tracts vary from 1200 to 8000. Census tracts provide a stable geographic unit for statistical analysis in the US Census and ACS [
The ACS is an ongoing national survey that samples a subset of individuals within the same geographic areas in the US Census. Using the same questions, data were collected each month throughout the year. In contrast, the US Census provides a more comprehensive sample of individuals in the United States, collecting data from more individuals during a particular period (March to August) but administered only once every 10 years. A metaphor helps elucidate the differences between the 2 surveys. The US Census serves as a high-resolution photograph of the US population once every 10 years, whereas the ACS serves as many low-resolution continually updated videos over the same period [
The census tract–level estimates and methodology for estimating health outcomes, health statuses, healthy behaviors, and disease prevention are provided by the CDC 500 Cities project. The 500 Cities project is a collaboration between the CDC and the Robert Wood Johnson Foundation. The small area estimates provided by the project allow policymakers and local health departments to better understand the burden and geographic distribution of health-related variables in their jurisdictions and assist them in planning public health interventions [
The bicycle and pedestrian path data for Norfolk, Virginia, and San Francisco, California include the latitude and longitude location of bicycle lanes, routes, and paths built and maintained in each city. Bicycle use data were taken from bicycle counters used in each city [
We used the Strava Metro rollup data set for Norfolk, Virginia, and San Francisco, California. This data set contains walking, running, and bicycling activity counts per road segment for a given year. These counts can then be aggregated at the census tract level. The road count segment is referred to as edge within Strava. Each edge is associated with a latitude and longitude bounding box using the Strava application programming interface [
Our data set included a wide range of variables collected from multiple sources. From this data set, we selected a subset of the variables that individuals with domain expertise identified as possibly contributing to the use of bicycle and pedestrian paths and the impact of bicycle and pedestrian paths on health outcomes when additional mileage was added to a geographic area (ie, census tract). The expertise of these individuals spanned social work, health science and nutrition, community health, public health, and transportation.
American Communities Survey
Race
Educational attainment
Employment status
Income and benefits
Marital status
Sex and age
Commuting to work
Citizenship
Health insurance
Occupation
Household by type
Relationship
Centers for Disease Control and Prevention 500 Cities project
Health outcomes
Health risk behaviors
Prevention
Health status
City Bicycle and Pedestrian Path data
Bicycle and Pedestrian Path use data
Bicycle and Pedestrian Path mileage data
Strava Bicycle and Pedestrian Path data
Bicycle and Pedestrian Path use data
The approach to joining together the data sets at the census tract level. ACS: American Communities Survey; BPP: bicycle and pedestrian path; CDC: Centers for Disease Control and Prevention; GIS: Geographical Information System.
Next, we applied factor analysis to reduce these observed variables into latent variables (ie, factors). Factor analysis generates a model that measures how changes in one factor predict changes in another by reducing a large number of observed variables to a handful of comprehensible underlying factors. The result is an interpretable and actionable model of concepts that are otherwise difficult to measure [
The Honesty-Humility (H), Emotionality (E), Extraversion (X), Agreeableness (A), Conscientiousness (C), and Openness to Experience (O) 6D model of the human personality structure is a widely known result of the application of factor analysis. The ability of factor analysis to reduce the many observed variables related to personality into 6 distinct factors has pushed the state of the art in psychological research [
We applied exploratory factor analysis (EFA) to filter the observed variables from the data described in
In our approach, EFA was used to fit a factor model. Before the EFA began, data corresponding to half of a given city’s census tracts were selected at random. In the application of our approach, data from 2011 to 2015 were used. Then, using these data, an EFA model was fitted.
Finally, the model was assessed. The assessment tests if all factors are composed of variables with high communality (>0.5) with respect to the factor they are associated with and low communality (<0.5) with all other factors. If this is true, the process terminates. Otherwise, variables that do not meet the communality requirement are discarded and the process is repeated for another iteration.
The process of generating a factor model for a city and verifying that it meets our defined restrictions. BPH: bicycling and pedestrian habits; BPP: bicycle and pedestrian path; CFA: confirmatory factor analysis; DBC: demographics and background characteristics; EFA: exploratory factor analysis.
Next, the fit of the hypothesized model was confirmed or rejected by applying confirmatory factor analysis (CFA) using the other half of the data from 2011 to 2015. The goal of CFA is to confirm or reject the hypothesized model. As a result, (1) only observed variables were included, (2) the variables were loaded onto the same factors as in the CFA, and (3) the communality of the variables in the model was assessed. The model was confirmed if it satisfied the same requirements as specified for EFA [
The application of factor analysis imposes several limitations on our approach for estimating the health effects of adding bicycle and pedestrian paths to the city-specific factor model. First, a model that meets our requirements must be generated using EFA and confirmed using CFA. Furthermore, to apply our algorithm, the model must consist of at least three factors reflecting residents’ (1) DBC, (2) health, and (3) BPH. Finally, the health factor must include at least one observed variable related to a health outcome, and the BPH must include an observed variable related to the amount of bicycle and pedestrian path mileage in the census tract. The process of generating a factor model and determining whether it meets these restrictions is illustrated in
We imposed these restrictions because our health outcome prediction algorithm computes the factor scores for each census tract in a city based on these factors. Factor scores are continuous numbers reflecting the extent to which each census tract manifests each factor. For each factor, the scores were distributed normally, with a mean of 0 and an SD of 1. Large positive values reflect census tracts where the factor is heavily present, and large negative values reflect census tracts where the factor is not present at all [
Without these factors, the proposed algorithm could not be applied. It does not have sufficient data or structure to produce estimates of the health effects of adding bicycle and pedestrian paths. This is a limitation of the proposed approach. This limitation is discussed in more detail in the
The three stages of an EFA iteration—(A) observed variable identification, (B) organization of variables into factors, and (C) assessment of the communality of variables within and between each of the identified factors. EFA: exploratory factor analysis.
Given a factor model hypothesized by EFA and confirmed by CFA, we proposed an algorithm to predict the health effects of adding bicycle and pedestrian paths at the census tract level. For this purpose, we defined the input as an observed variable identified from the factor model. The variable then progressed through a sequence of steps that were applied to each census track and resulted in a predicted health outcome change for each identified health factor. The steps of this algorithm are enumerated in the following sections. Finally, the output from the algorithm was a list of hypothesized health improvement outcomes.
In our problem statement, there was only one observed variable in the model that could be changed directly by a city-level planner, public health official, or decision maker. This variable represented the additional bicycle and pedestrian path mileage for a census tract within a city. This was the input to our algorithm, along with the factor model generated for the city.
The algorithm proceeded as follows, as conveyed visually in
The algorithm adds the bicycle and pedestrian path mileage to the specified census tract in the data set for the city.
Factor scores are computed for the following three factors: DBC, health, and BPH.
Given the DBC factor score for the input census tract, the algorithm identifies all other census tracts in the city with a DBC factor score within the threshold value—x. This list of census tracts reflects those that are similar to the input census tract with respect to the DBC factor. Recall that the factor scores are normally distributed, with an SD of 1. Thus, a census tract within a factor score x of the tract being analyzed reflects a census tract within SDs of the input tract [
Given the BPH factor score for the input census tract (which includes the newly added bicycle and pedestrian path mileage), the algorithm identifies all other census tracts in the city with BPH factor scores within x. This list of census tracts reflects those that are similar to the input census tract with respect to the BPH factor.
For each observed health outcome within the health factor, the algorithm creates a list that stores the difference between the value of the health outcome for each census tract identified in steps 3 and 4 and the value of the health outcome for the input census tract. This list of differences is a distribution of hypothesized improvements in a health outcome by adding a specified amount of bicycle and pedestrian path mileage to a census tract. Any differences that are <0 are discarded because these differences indicate that adding bicycle and pedestrian path mileage to the census tract will degrade health outcomes.
Instantiation of the algorithm for predicting how much additional BPP mileage in a census tract will improve health outcomes. BPH: bicycling and pedestrian habits; BPP: bicycle and pedestrian path; DBC: demographics and background characteristics.
For each list of hypothesized improvements for health outcomes generated in step 5, the algorithm output the minimum, mean, median, and maximum values of the improvements to the user. The algorithm could also report the entire distribution of possible improvements and SD of the distribution for each health outcome.
The accuracy of our algorithm was elucidated through an empirical evaluation of alternative approaches for two different cities (Norfolk, Virginia, and San Francisco, California). In our evaluation, we computed how accurately each approach predicted the health outcome improvements of the bicycle and pedestrian paths added in each city in 2016. Specifically, for a given census tract, in each city that added bicycle and pedestrian paths miles in 2016, we evaluated how accurately our algorithm estimated an improvement in health outcomes in 2020. We chose to use a 5-year time-lapse period for our evaluation because research has shown that is the expected amount of time for a fully realized change in health outcomes given outdoor exercise infrastructure interventions [
Applying the process described in the
Exploratory factor analysis and confirmation factor analysis models for Norfolk, Virginia, using data sets from 2011 to 2015. Single-headed arrows reflect the commonality of an observed variable with a factor. Double-headed arrows reflect the value of the shared variance between factors. BPH: bicycling/pedestrian habits; BPP: bicycle and pedestrian path; DBC: demographics and background characteristics.
Exploratory factor analysis and confirmation factor analysis models for San Francisco, CA, using data sets from 2011 to 2015. Single-headed arrows reflect the commonality of an observed variable with a factor. Double-headed arrows reflect the value of the shared variance between factors. BPH: bicycling and pedestrian habits; BPP: bicycle and pedestrian path; DBC: demographics and background characteristics.
Recall that our algorithm took an input: (1) the factor model for a given city and (2) the census tract and amount of bicycle and pedestrian path mileage to be added. It then output the minimum, mean, median, and maximum estimated improvements by adding the bicycle and pedestrian path mileage to the input census tract. In the evaluation, we only used the median improvement estimate from the algorithm.
In our evaluation, we used our factor model constructed using data from 2011 to 2015 to estimate the accuracy of our approach and 2 alternative approaches with respect to the improvements in health outcomes provided by bicycle and pedestrian paths installed in 2016. The evaluation included 31.58 miles (50.81 km) of bicycle and pedestrian paths added in Norfolk, Virginia, across 31 census tracts and 52.36 miles (84.25 km) of bicycle and pedestrian paths added tracts in San Francisco, California, across 49 census tracts.
Evaluation setup metadata for Norfolk, Virginia, and San Francisco, California, in 2016.
|
Norfolk, Virginia | San Francisco, California |
BPPa miles (km) added | 31.58 (50.81) | 52.36 (84.25) |
Census tracts with paths added, n | 31 | 49 |
Census tracts in city, n | 77 | 195 |
Health outcomes evaluated | Diabetes %; poor physical health %; high blood pressure % | Diabetes %; stroke % |
aBPP: bicycle and pedestrian path.
We evaluated our algorithm using 2 alternative approaches. The first alternative assumed that each health outcome within a census tract in the future would be same as the average value for that health outcome for the census tract from 2011 to 2015. This approach mirrored the prediction that the temperature tomorrow would be the same as the average temperature of the previous 5 days.
The second alternative used linear regression modeling [
We evaluated our approach by using x=0.50. Recall that x is the threshold used to identify similar census tracts in terms of the (1) DBC factor and (2) BPH factor scores. In addition, our evaluation approach is an extension of the algorithm described in the
Each time the algorithm was executed, the median improvement from the algorithm was collected. The largest improvement over all the runs was reported. A version of our approach is shown in
The specific version of our algorithm included in the applied evaluation. BPP: bicycle and pedestrian path; BPH: bicycling and pedestrian habits; DBC: demographics and background characteristics.
For a given city and a given approach to estimating the improvement in a health outcome for bicycle and pedestrian paths added in 2016, we computed the following two measures of effectiveness (MOEs): (1) the root mean squared error (RMSE) and (2) the mean absolute error (MAE). These are 2 established metrics used to measure the accuracy of continuous variables. MAE measures the average magnitude of the errors in a set of predictions without considering their direction. It reflects the average over the evaluation of the absolute differences between the prediction and actual observation where all individual differences have equal weight. RMSE also measures the average magnitude of the error. However, it reflects the square root of the average squared differences between the predicted and actual observations. Within the RMSE, the errors were squared before they were averaged. As a result, the RMSE gives a relatively high weight to large errors [
We deem our approach successful if, for each city included in our evaluation, our approach is more accurate across every MOE than the best alternative approach, and these differences are all statistically significant at
In our evaluation, we compare the accuracy of our factor model approach, a linear regression approach, and predict no change approach. Each approach estimates the improvements in health outcomes provided by bicycle and pedestrian paths installed in 2016 in 31 census tracts in Norfolk, Virginia and 49 census tracts in San Francisco, California. The results of the evaluation are shown in
We expected our approach to outperform the “predict no change approach” because the CDC 500 Cities project and bicycle and pedestrian path data for both cities show that most of the time when a bicycle path of any length is added, the health outcomes identified by the factor analysis improve within 5 years. However, we did not know whether our approach outperformed the linear regression approach.
The results of the evaluation showed that our approach outperformed the linear regression models because it assumed that critical thresholds within the DBC and BPH factors existed (parameter x in steps 3 and 4 of the algorithm). The linear regression approach did not make this assumption [
By not accounting for this threshold, the linear regression approach could overpredict the expected improvement in health outcomes within a census tract. This was because the linear regression approach assumed that some amount of bicycle and pedestrian paths in each census tract would yield a population without any negative health outcomes. This is unrealistic. Our evaluation results in
Evaluation of approaches for bicycle and pedestrian paths added in Norfolk, Virginia, in 2016.
Health outcome and MOEa (% of individuals who experience a negative health outcome) | Predict no change (census tract: n=31), mean (SD) | Linear regression (census tract: n=31), mean (SD) | Our approach (census tract: n=31), mean (SD) | |
|
||||
|
MAEb | 2.33 (0.66) | 2.14 (0.67) | 1.63 (0.59) |
|
RMSEc | 2.41 (0.62) | 2.29 (0.61) | 1.67 (0.55) |
|
||||
|
MAE | 2.69 (0.72) | 2.21 (0.69) | 1.83 (0.57) |
|
RMSE | 2.64 (0.69) | 2.27 (0.66) | 1.94 (0.56) |
|
||||
|
MAE | 2.95 (1.17) | 2.27 (1.07) | 1.49 (0.85) |
|
RMSE | 3.18 (1.13) | 2.38 (0.92) | 1.55 (0.82) |
aMOE: measure of effectiveness.
bMAE: mean absolute error.
cRMSE: root mean squared error.
Evaluation of approaches for bicycle and pedestrian paths added in San Francisco, California, in 2016.
Health outcome and MOEa (% of individuals who experience a negative health outcome) | Predict no change (census tract: n=49), mean (SD) | Linear regression (census tract: n=49), mean (SD) | Our approach (census tract: n=49), mean (SD) | |
|
||||
|
MAEb | 2.32 (1.19) | 2.18 (1.18) | 1.24 (0.91) |
|
RMSEc | 2.44 (1.11) | 2.41 (1.11) | 1.35 (0.90) |
|
||||
|
MAE | 2.68 (0.58) | 2.78 (0.68) | 1.81 (0.52) |
|
RMSE | 3.19 (0.52) | 2.97 (0.64) | 1.88 (0.49) |
aMOE: measure of effectiveness.
bMAE: mean absolute error.
cRMSE: root mean squared error.
Assessment of whether the improved accuracy of bicycle and pedestrian paths added in 2016 is statistically significant.
City, health outcome, and MOEa | Statistical significance of our approach MOE versus best alternative MOE, |
||
|
|||
|
|
||
|
|
MAEb | <.001 |
|
|
RMSEc | <.001 |
|
|
||
|
|
MAE | <.001 |
|
|
RMSE | <.001 |
|
|
||
|
|
MAE | <.001 |
|
|
RMSE | <.001 |
|
|||
|
|
||
|
|
MAE | <.001 |
|
|
RMSE | <.001 |
|
|
||
|
|
MAE | <.001 |
|
|
RMSE | <.001 |
aMOE: measure of effectiveness.
bMAE: mean absolute error.
cRMSE: root mean squared error.
Our study builds on a significant amount of previous research. Numerous researchers have used statistical analyses to (1) explore the health effects of commuting via bicycle or by foot [
Predicting which bicycle and pedestrian paths residents will choose is also related to our work. Within this arena, researchers have found different results with respect to the extent to which bicycle and pedestrian path users prefer to take paths that minimize the total travel distance. For example, Broach et al [
There is also significant research focused on understanding the rate at which future use of bicycle and pedestrian paths will change, as commuters who currently do not use bicycle and pedestrian paths start to transition into commuting by foot or bicycle. Waldykowski et al [
In addition, researchers have attempted to better understand the impact of bicycle and pedestrian paths on health outcomes. This work includes (1) cost-benefit analysis of bicycle and pedestrian paths with respect to health improvements [
These studies demonstrate the need for granular analysis with actionable outcomes with respect to bicycle and pedestrian paths. Furthermore, although the studies have had a significant impact on the research community, none of them constructed a city-specific model to advise decision makers about the extent to which adding bicycle and pedestrian paths to a census tract would improve residents’ health outcomes. Our study addresses this problem within a larger bicycle and pedestrian path research area.
Strava has emerged as a tool of interest for collecting data on bicycling, running, and walking, understanding the effects of new interventions for users, and promoting safety among riders. However, this crowdsourced data are biased toward recreational riders, who are frequent users of GPS-enabled fitness apps. Thus, there is a need to quantify and correct the inherent bias in crowdsourced data to better represent all residents across various demographics. Strava users tend to be more frequently identified as male, be older, and have more income than the general population [
Controlling for these biases in the Strava and municipal count data is beyond the scope of our work. However, it is important to note that there were biases in the data. Ultimately, these limitations mean that the Strava data sets that informed our study are nonuniform subsamples of the traffic of cyclists, walkers, and runners in Norfolk, Virginia, and San Francisco, California.
It is also important to note that the use of e-bikes has changed significantly during the period of our study [
Recall that our approach uses 5 years of past data to fit a factor model and requires the factor model to consist of at least three factors where unique factors reflect residents’ (1) DBC, (2) health, and (3) BPH. In addition, the health factor must include at least one observed variable related to a health outcome, and the BPH factor must include an observed variable related to the amount of bicycle and pedestrian path mileage in the census tract. For cities in which these requirements cannot be met, our approach cannot be applied. This limits its utility and geographic area of applicability. However, related research has shown that these factors are important to account for and often present when understanding who chooses to use bicycle and pedestrian path and how effective bicycle and pedestrian paths are in improving health outcomes [
Threats to internal and external validity affected our study. Threats to internal validity arose when factors affected the dependent variables without evaluators’ knowledge. It is possible that some flaws in the implementation of our model could have affected the evaluation results. However, our approach used established libraries to conduct factor analysis, and the source code passed internal reviews [
Threats to external validity occur when evaluation results cannot be generalized. Although the evaluation was performed using more than 83 miles of added bicycle paths in 80 census tracts across the 2 cities, the factor models and accuracy results cannot necessarily be generalized to other areas. In addition, the factor analysis that generates our models assumes that each pair of variables follows a bivariate normal distribution. Although we verified that this assumption was true in our data, it may not be generalizable to other data sets and other cities where the approach is applied. However, it is very important to note that our approach, which yielded models producing these results, can be applied to other cities assuming that factor models that meet our requirements exist [
Our work is directly actionable for policy makers, public health professionals, and urban planners in Norfolk, Virginia, and San Francisco, California, by providing concrete insight into the question “To what extent will adding specified bicycle and pedestrian path mileage to a census tract improve residents’ health outcomes over time?” Specifically, it enables them to (1) weigh the extent to which 2 bicycle and pedestrian paths of equal cost proposed in 2 different census tracts improve the health outcomes of the residents, (2) identify areas where bicycle and pedestrian paths are unlikely to be effective public health interventions and other strategies should be used to help residents, and (3) quantify the minimum amount of bicycle path miles that need to be added in a given census tract to maximize the improvement in health outcomes for residents. Our results demonstrate that for 2 different cities, our approach estimates improvements in health outcomes more accurately than alternate approaches, and these improvements are statistically significant.
A web application that implements our algorithm and summarizes its findings in an actionable manner is available [
American Community Survey data used for factor analysis.
The Centers for Disease Control and Prevention 500 Cities project data were used for factor analysis and evaluation.
City-supplied bicycle and pedestrian count and path length data used for factor analysis.
Strava-supplied bicycle and pedestrian trip count data used for factor analysis.
Prospective variables identified by subject matter experts for factor analysis.
Goodness-of-fit measure descriptions and statistics for generated factor models.
Data and source code for web application deployment of model for Norfolk, Virginia.
Data and source code for research artifacts produced from the model for Norfolk, Virginia, including recommended portfolios of bicycle and pedestrian paths and time series of expected health outcome improvements.
Data and source code for web application deployment of the model for San Francisco, California.
American Communities Survey
bicycling and pedestrian habits
Centers for Disease Control and Prevention
confirmatory factor analysis
demographics and background characteristics
exploratory factor analysis
mean absolute error
measure of effectiveness
root mean squared error
This study was funded by the Hampton Roads Biomedical Research Consortium (300675-010 IRAD). The Hampton Roads Biomedical Research Consortium reviewed and approved the submission of the manuscript.
The contents of the multimedia appendices are specified below and are supplied in the paper. They are also available on the web as Mendeley Data [
RG, CAJ, RMR, AC, and CJL conceived and designed the experiments. RG performed the experiments. RG, GF, YK, and PK analyzed the data. RG, PA, PK, and YK contributed the reagents, materials, and analysis tools. RG, CJL, CAJ, GF, AC, and RMR wrote the paper.
None declared.