Background: Health research using commercial data is increasing. The evidence on public acceptability and sociodemographic characteristics of individuals willing to share commercial data for health research is scarce.
Objective: This survey study investigates the willingness to share commercial data for health research in the United Kingdom with 3 different organizations (government, private, and academic institutions), 5 different data types (internet, shopping, wearable devices, smartphones, and social media), and 10 different invitation methods to recruit participants for research studies with a focus on sociodemographic characteristics and psychological predictors.
Methods: We conducted a web-based survey using quota sampling based on age distribution in the United Kingdom in July 2020 (N=1534). Chi-squared tests tested differences by sociodemographic characteristics, and adjusted ordered logistic regressions tested associations with trust, perceived importance of privacy, worry about data misuse and perceived risks, and perceived benefits of data sharing. The results are shown as percentages, adjusted odds ratios, and 95% CIs.
Results: Overall, 61.1% (937/1534) of participants were willing to share their data with the government and 61% (936/1534) of participants were willing to share their data with academic research institutions compared with 43.1% (661/1534) who were willing to share their data with private organizations. The willingness to share varied between specific types of data—51.8% (794/1534) for loyalty cards, 35.2% (540/1534) for internet search history, 32% (491/1534) for smartphone data, 31.8% (488/1534) for wearable device data, and 30.4% (467/1534) for social media data. Increasing age was consistently and negatively associated with all the outcomes. Trust was positively associated with willingness to share commercial data, whereas worry about data misuse and the perceived importance of privacy were negatively associated with willingness to share commercial data. The perceived risk of sharing data was positively associated with willingness to share when the participants considered all the specific data types but not with the organizations. The participants favored postal research invitations over digital research invitations.
Conclusions: This UK-based survey study shows that willingness to share commercial data for health research varies; however, researchers should focus on effectively communicating their data practices to minimize concerns about data misuse and improve public trust in data science. The results of this study can be further used as a guide to consider methods to improve recruitment strategies in health-related research and to improve response rates and participant retention.
Health researchers are increasingly aiming to include accurate personal information collected outside of health care settings, including commercial data collected or processed for businesses relating to their customers (eg, internet searches, social media, loyalty cards, wearable devices, and mobile phone apps), to enhance our understanding of individuals’ health-related behaviors and health outcomes. With the rise of different data sources to track, monitor, and forecast disease and health outcomes, interest in carrying out research using individual commercial data has grown substantially [- ]. However, much of this valuable research is often criticized for its representativeness and the low participation rates associated with public attitudes toward data sharing.
Evidence on the public acceptability of sharing health-related data is vast and suggests that improving the transparency of data collection and processing practices across institutions, creating trustworthy data ecosystems, and providing agency and data stewardship for data participants can improve the willingness to take part in research and share data [- ]. The evidence for willingness to share data across different contexts varies depending on the type, purpose, and use of data. These often show that the public has some understanding of how data are being used and equally suggest that raising awareness about data practices does not increase willingness to share data [ ]. A recent report highlights that further research is needed to improve public trust in light of the Cambridge Analytica scandal and other reported data misuse incidences [ ].
Furthermore, with the implementation of the General Data Protection Regulation (GDPR) in 2018 , all individuals were given the right to carry out a subject access request from any organization that holds any information about them, thus allowing researchers to start analyzing new types of data sets with individual consent to understand behaviors such as diet [ ], self-medication [ ], and cancer risk [ ] using purchase history recorded on loyalty cards; that is, an identity card issued by retailers to its customers to collect information on buyer behavior and generate reward schemes. However, a common limitation is the small sample size and biased population of individuals who are more willing to share their data [ ]. A number of qualitative studies investigated willingness to share commercial data, specifically loyalty cards for health research, echoing the principal evidence shared across disciplines, as discussed earlier [ , , ]. In contrast, for mobile and biosensor data sharing, there is a growing body of literature on the importance of understanding nonparticipation and willingness to share mobile phone apps and biosensor data [ - ]. A study that took place in England before the GDPR highlighted that, in the context of mobile data sharing, user behavior is also associated with willingness to share passive or actively collected mobile data [ ]. Further experimental studies have highlighted the behavior of sharing mobile data, which requires capabilities from the users to fulfill the task and the characteristics of the individuals, framing of the request, emphasis on control over data, and assurances of privacy and confidentiality [ ]. The implications of the willingness to share smartphone and sensor data with researchers are further understood in studies where the response rate for data sharing is less than 15%, and the representativeness of the population that shares the data is less than optimal [ ]. This highlights the importance of understanding the characteristics of the population who are willing to share commercial data sets before data collection so that strategies can be developed to improve response rates and minimize bias.
Therefore, this study aims to investigate GDPR awareness and sociodemographic and psychological factors associated with the willingness to share commercial data for health research purposes after the implementation of the GDPR in the United Kingdom. Furthermore, it aimed to provide a summary of the public’s awareness of GDPR in 2020, 2 years after the GDPR and Data Protection Act 2018 were enacted in the United Kingdom. The GDPR has been kept in UK law as the UK GDPR . For epidemiological research to advance using commercial data sets and effectively recruit participants, it is important to investigate the factors associated with the willingness to share commercial data for health research.
Setting and Design
A 10-minute web-based survey was conducted in the United Kingdom in August 2020 via Survey Monkey using Dynata International Limited. Nonprobability quota sampling was used for an adequate representation of different age groups in the United Kingdom, with the aim of recruiting 1500 participants to achieve a 1:10 participant-to-item ratio . The distribution of the sample, respectively, based on age and sex distribution in the UK population was 18 to 29 years (20%), 30 to 39 years (17%), 40 to 49 years (18.5%), 50 to 59 years (15.5%), 60 to 69 years (14%), >70 years (15%), male (49.4%), and female (50.6%) [ ].
The project was reviewed by the University of College London Research Ethics Committee and received a favorable opinion (ref: 18095/001) and reported using the Strengthening the Reporting of Observational Studies in Epidemiology guidelines for cross-sectional research . Information regarding ethics approval can be obtained directly from the University of College London Research Ethics Committee. The survey study only included anonymized data from the participants; therefore, the research team had no contact with the participants following their participation in the study. If individuals dropped out of the survey before its completion, this was considered a withdrawal from the study, and no data were included. Participants were paid a small monetary incentive through Dynata International Limited in line with their participant payment policies.
All measures with their item heritage are included in. There were 3 primary outcome measures. These were the willingness to share commercial data for health research with different institutions (government, private, and academic), the willingness to share different types of commercial data (internet searches, social media, shopping data on loyalty cards, wearable devices, and mobile phone apps) with academic institutions, and the willingness to participate in health research based on different invitation sources. The rationale for these outcomes is as follows. In comparison to government and private organizations, which are often the primary data controllers for health, administrative, and commercial data, researchers at academic institutions often need to request access to data collected and controlled by government and private organizations. The differences between institutions were considered to understand the potential baseline response rate for potential research projects that aim to use commercial data in health research at academic institutions. The second primary outcome was then focused on willingness by data type and the extent to which this differentiates from the baseline willingness to share commercial data with academic institutions. The last outcome is included to consider how much willingness varied depending on the source of the invite to better understand the best ways to recruit participants who are more willing to share their data. All these outcomes were used to inform the communication strategies of a much larger academic project that aimed to recruit individual participants with informed consent requesting access to their commercial data, specifically loyalty card data from 2 UK-based high-street retailers, investigating self-care behaviors before ovarian cancer diagnosis [ ].
The independent variables were included under four sections: (1) sociodemographic factors, including age, sex, marital status, education, ethnicity, and location in the United Kingdom; (2) the participants’ GDPR awareness; (3) psychological factors including trust in institutions, trust in data practices in academia, worry about data misuse, perceived risk in data sharing, perceived importance of privacy, and perceived benefit of data sharing; and (4) past experience taking part in health research and past experiences of data misuse.
We reported all measures and exclusions in this study and used complete case analysis without imputing missing data, as all questions were mandatory.
Factor analyses using principal component analysis (PCA) and reliability tests were carried out to ensure that the items included in various other studies measured the intended outcomes. Once the factors were identified, the scales were computed using total scores. A Cronbach alpha coefficient was calculated for each scale for internal consistency, and the interitem and interscale correlations were checked for internal consistency of items and scales. Each computed scale was reported using range, mean, SD, and Cronbach alpha coefficient.
Participant characteristics, self-reported GDPR awareness, and people’s awareness of personal data and GDPR law were reported using descriptive statistics. Some categorical items were recoded for ease of presentation and understanding of the differences in each category. Responses to items including “prefer not to say” and “other” were coded as missing because of the low cell count (<5 observations in each category) in sociodemographic items which would not have been coded negatively and subsequently excluded from the main analyses (40/1594, 2.5%). Primary outcome variables were recoded into “definitely yes” or “probably yes”=1 and “probably no” or “definitely no”=0 to compare 2 distinct intentions to share data for the comparison between sociodemographic characteristics of the participants . Differences in the proportions of willingness to share commercial data were tested using chi-square statistics and reported in percentages. Ordered logistic regression was used to test for psychological factors associated with willingness to share commercial data for health research with different organizations and different types of data adjusted for the sociodemographic characteristics of the participants, previous research participation, and GDPR awareness. The variance explained by each model is included in . Further ordered regression analyses were carried out for the different types of research invitations to identify whether there were sociodemographic factors associated with willingness to participate in health research ( ). All results reported using adjusted odds ratios (aORs) and 95% CIs were reported using a P value of <.05.
Factor Analysis Results
The 32 items measured in this study were subjected to PCA using SPSS (version 27; IBM Corp). Before performing PCA, the suitability for performing PCA was assessed. Inspection of the correlation matrix revealed the presence of many coefficients >0.3. The Kaiser-Meyer-Olkin value was 0.93, above the recommended value of 0.6, and the Bartlett Test of Sphericity reached statistical significance, supporting the factorability of the correlation matrix. The PCA revealed the presence of 5 components with eigenvalues exceeding 1, explaining 28.6%, 19.7%, 7.5%, 6.9%, and 5.3% variance (). A total of 8 items were recoded, and 2 items were deleted, as they measured trust in 2 different organizations. On the basis of these results, 5 scales were computed. These are, namely, perceived importance of privacy (mean 8.46, SD 1.38; range 2-10; Cronbach α=.65), worry about data misuse (6 items; mean 21.28, SD 5.69; range 5-30; Cronbach α=.95), trust in data practices in academic institutions (9 items; mean 31.96, SD 7.14; range 5-45; Cronbach α=.95), perceived risk of data sharing for health research (3 items; mean 9.40, SD 2.73; range 3-15; Cronbach α=.88) and perceived benefits of data sharing (5 items; mean 17.76, SD 4.31; range 5-25; Cronbach α=.93). Factor correlations as separate scales suggest that the scales have weak to moderate correlations, indicating that they measure separate scales ( ).
Out of the 1897 responses, 1534 participants gave their consent and completed the survey (). Approximately 49.1% (753/1534) of participants were male, 50.7% (777/1534) were female, and 0.2% (4/1534) indicated other. The age distribution of participants was consistent with the quota sample for the distribution of age in England. Most respondents self-identified with a White ethnic background (1325/1534, 86.4%), compared with only 12.9% (198/1534) who identified themselves with other ethnicities. Approximately 53.1% (814/1534) of the participants were married or had a legal partnership. Approximately 45.6% (699/1534) of participants had higher education (degree and above) qualifications, almost half of them (765/1534, 49.9%) had less than higher education qualifications, and only 4.5% (69/1534) did not have any educational qualification.
|Characteristics||Values, n (%)||Population composition of England and Wales based on 2011 Census (excludes Scotland and Northern Ireland) , %|
|White British||1326 (87.0)||85.4|
|Married or legal partnership||814 (53.1)||50.8|
|Widowed, divorced, or separated||212 (13.8)||14.6|
|Higher education||697 (45.6)||27.1|
|Higher education with qualification||762 (49.9)||49.9|
|No qualification||68 (4.5)||23|
|Location in the United Kingdom|
|East of England||149 (9.8)||10.4|
|South East||205 (13.4)||15.4|
|South West||136 (8.9)||9.4|
|West and East Midlands||218 (14.3)||18.1|
|Yorkshire and the Humber and North East||197 (12.9)||14|
|North West||184 (12)||12.6|
|General Data Protection Regulation awareness|
|Not aware||179 (11.7)||—b|
|Yes, I have heard but do not know much about it||428 (27.9)||—|
|Yes, I have heard and know a little about it||658 (42.9)||—|
|Yes, I have heard and I know a lot about it||269 (17.5)||—|
|Previous health research participation|
aN/A: not applicable.
bData are not available for the distribution of the General Data Protection Regulation Awareness and previous health research participation in England.
GDPR and Personal Data Awareness
At the time of the survey, 11.7% (179/1534) of the participants indicated that they were not aware of GDPR, 27.9% (428/1534) had heard of GDPR but did not know much about it, 42.9% (658/1534) had heard and knew a little about GDPR, and 17.5% (269/1534) of respondents had heard and knew a lot about GDPR.
The results of participants’ expectations of what is considered personal data under GDPR showed that >80% (1227/1534) of the participants were able to correctly state common information that was classified as personal information, such as name, age, gender, marital status, and home address and email address. Less than 75% (1150/1534) of participants considered sensitive personal information, such as sexual orientation (1121/1534, 73.1%), religion (1067/1534, 69.6%), criminal records (1100/1534, 71.7%), and health or medical records (1136/1534, 74.1%) as personal data.
Less than two-thirds of the participants expected the various types of information collected on the internet to count as personal data. A quarter of the participants incorrectly stated that web-based purchases (383/1534, 25.0%), location data based on General Packet Radio Service recorded on mobile phones (353/1534, 23%), tracking information on websites (cookies; 383/1534, 25%), social media information (424/1534, 27.6%), and device IDs (353/1534, 23%) were not personal data.
More than 80% (1227/1534) of the participants correctly identified what GDPR law should cover most of the rights that protect personal data. Most of the remaining participants stated that they did not know the right answer ranging from 6.6% (101/1534) to 13.4% (205/1534). Approximately 13.4% (205/1534) did not know that they had the right to erase their data, and 12.3% (189/1534) did not know that they had the right to be informed about the use of their data. Additional details are provided in.
Willingness to Share Commercial Data With Different Institutions
shows that nearly two-thirds of the participants indicated that they would be willing to share their commercial data for health research if their data are shared with a government institution (937/1534, 61.1%) or an academic research institution (936/1534, 61.0%). In contrast, less than half were happy to share their commercial data with private organizations for health research (658/1534, 42.9%). Across all participants, only 4.8% (73/1534) of the participants stated “definitely yes” to share with all types of institutions. In comparison, 7.4% (114/1534) of the participants stated “definitely no” to share commercial data for health research with all institutions.
In, the analysis shows significant differences in the willingness to share commercial data with government institutions by age, sex, education, and GDPR awareness. Specifically, less than two-thirds of participants were happy to share their data among those aged 40 to 49 years (841/1534, 54.8%) and 50 to 59 years (871/1534, 56.8%), compared with the 18 to 29 (923/1534, 60.2%), 30 to 39 (1040/1534, 67.8%), 60 to 69 (922/1534, 60.1%), and ≥70 years (1020/1534, 66.5%) groups, respectively (χ25=14.6, P=.12). Male participants were more likely to share their commercial data for health research than female participants with the government (994/1534, 64.8% vs 887/1534, 57.8%; χ21=7.9, P=.005). There was a 14.4% difference between willingness to share among those who were unaware of GDPR (822/1534, 53.6%) and those who stated that they knew a lot about GDPR (1043/1534, 68.0%; χ23=9.7; P=.02). No differences were found in marital status, ethnicity, previous research participation, and personal experience of data misuse in the past.
There were differences in the willingness to share commercial data with private organizations based on most factors, except for education and previous research participation. The largest differences were observed for age between those who were aged 30 to 39 years (147/255, 57.6%), and 60 to 69 years (64/213, 30%), and ≥70 years (80/266, 30.1%), as well as by ethnicity among those identified as Black (28/45, 62.2%) and White (555/1326, 41.9%; χ25=74.1, P<.001). Similarly, only one-third of those who reported not being aware of GDPR were willing to share their data (54/179, 30.2%) compared with those who were aware but did not know much (177/428, 41.4%), a little (292/658, 44.4%), and knew a lot about GDPR (138/269, 51.3%; χ23=20.5, P<.001). Ever experienced a negative event of data misuse was also positively associated with willingness to share commercial data for health research with private organizations compared with never experiencing a negative event (287/596, 48.2% vs 374/938, 39.9%; χ21=10.1, P=.001).
There were no significant differences in the proportion of people who indicated “Definitely and Probably yes” to sharing with academic institutions by marital status, age, sex, ethnicity, and previous experience. However, educational level (above degree: 459/697, 65.9% vs below degree: 474/830, 57.1%; χ21=12.1, P<.001), previous participation in health research (yes: 266/402, 66.2% vs no: 670/1132, 59.2%; χ21=6.0, P=.01), and greater GDPR awareness (not aware 84/179, 46.9% vs know a lot about it 179/269, 66.5%; χ23=20.8, P<.001) were positively associated with sharing data with academic institutions.
|Willingness to share||Government organizations||Private organizations||Academic institutions|
|Single, n (%)||284 (57.7)||230 (46.7)||306 (62.2)|
|Married or legal partnership, n (%)||512 (62.9)||354 (43.5)||489 (60.1)|
|Widowed, divorced, or separated, n (%)||134 (63.2)||73 (34.4)||134 (63.2)|
|Chi-square (df)||3.8 (2)||9.1 (2)||1.0 (2)|
|18-29, n (%)||171 (60.2)||153 (53.9)||177 (62.3)|
|30-39, n (%)||173 (67.8)||147 (57.6)||168 (65.9)|
|40-49, n (%)||137 (54.8)||118 (47.2)||144 (57.6)|
|50-59, n (%)||151 (56.8)||99 (37.2)||154 (57.9)|
|60-69, n (%)||128 (60.1)||64 (30.0)||129 (60.6)|
|≥70, n (%)||177 (66.5)||80 (30.1)||164 (61.7)|
|Chi-square (df)||14.6 (5)||74.1 (5)||5.1 (5)|
|White, n (%)||808 (60.9)||555 (41.9)||821 (61.9)|
|Black, n (%)||33 (73.3)||28 (62.2)||26 (57.8)|
|Asian, n (%)||66 (59.5)||55 (49.5)||64 (57.7)|
|Mixed, n (%)||18 (56.3)||15 (46.9)||15 (46.9)|
|Other ethnicities, n (%)||6||<5||5|
|Chi-square (df)||3.2 (4)||9.6 (4)||4.3 (4)|
|Male, n (%)||488 (64.8)||359 (47.7)||473 (62.8)|
|Female, n (%)||449 (57.8)||300 (38.6)||462 (59.5)|
|Chi-square (df)||7.9 (1)||12.8 (1)||1.8 (1)|
|<Degree and no formal education, n (%)||481 (58.0)||367 (44.2)||474 (57.1)|
|≥Degree, n (%)||451 (64.7)||290 (41.6)||459 (65.9)|
|Chi-square (df)||7.2 (1)||1.0 (1)||12.1 (1)|
|General Data Protection Regulation awareness|
|Not aware, n (%)||96 (53.6)||54 (30.2)||84 (46.9)|
|Yes, I have heard but do not know much about it, n (%)||258 (60.3)||177 (41.4)||254 (59.3)|
|Yes, I have heard but know little about it, n (%)||400 (60.8)||292 (44.4)||419 (63.7)|
|Yes, I have heard and I know a lot about it, n (%)||183 (68.0)||138 (51.3)||179 (66.5)|
|Chi-square (df)||9.7 (3)||20.5 (3)||20.8 (3)|
|Previous research participation|
|Yes, n (%)||257 (63.9)||171 (42.5)||266 (66.2)|
|No, n (%)||680 (60.1)||490 (43.3)||670 (59.2)|
|Chi-square (df)||1.8 (1)||0.1 (1)||6.0 (1)|
|Personal experience of data misuse|
|Never, n (%)||559 (59.7)||374 (39.9)||559 (59.6)|
|Ever, n (%)||378 (63.4)||287 (48.2)||377 (63.3)|
|Chi-square (df)||2.2 (1)||10.1 (1)||2.0 (1)|
Adjusted ordered regression analyses for willingness to share data with each institution inshow that greater trust is positively associated with sharing commercial data with government (aOR 2.499, 95% CI 2.228-2.802; P<.001), private (aOR 2.513, 95% CI 2.221-2.842; P<.001), and academic institutions (aOR 2.283, 95% CI 2.011-2.59; P<.001). Greater worry about data misuse was negatively associated with willingness to share with government (aOR 0.94, 95% CI 0.918-0.961; P<.001), private (aOR 0.951, 95% CI 0.930-0.973; P<.001), and academic institutions (aOR 0.947, 95% CI 0.926-0.969; P<.001).
Participants’ perceived importance of privacy was negatively associated with willingness to share with the government (aOR 0.909, 95% CI 0.833-0.992; P=.03), private institutions (aOR 0.833, 95% CI 0.763-0.909; P<.001), and academic institutions (aOR 0.869, 95% CI 0.797-0.948; P=.002). Participants’ perceived risk of data sharing was not associated with their willingness to share their data with any organization. The perceived benefits of sharing data were positively associated with government institutions (aOR 1.111, 95% CI 1.083-1.14; P<.001), private institutions (aOR 1.081, 95% CI 1.054-1.109; P<.001), and academic institutions (aOR 1.116, 95% CI 1.087-1.146; P<.001).
|Government institutes, aORb (95% CI)||Private institutes, aOR (95% CI)||Academic institutes, aOR (95% CI)|
|Trust in organizations||2.499 (2.228-2.802c)||2.513 (2.221-2.842c)||2.283 (2.011-2.590c)|
|Worry about data misuse||0.940 (0.918-0.961c)||0.951 (0.930-0.973c)||0.947 (0.926-0.969c)|
|Perceived risk of data sharing||1.041 (0.997-1.086)||1.042 (0.997-1.089)||1.016 (0.974-1.060)|
|Perceived importance of privacy||0.909 (0.833-0.992d)||0.833 (0.763-0.909c)||0.869 (0.797-0.948c)|
|Perceived benefits of sharing data and participation||1.111 (1.083-1.140c)||1.081 (1.054-1.109c)||1.116 (1.087-1.146c)|
aAdjusted for age, sex, location, ethnicity, education, General Data Protection Regulation awareness, and past health research participation. The full model with P values is reported in.
baOR: adjusted odds ratio.
Willingness to Share Different Types of Commercial Data
shows that the participants’ willingness to share commercial data varied across all data types. The willingness to share loyalty card data had the highest proportion of participants at 51.8% (795/1534) stating that “Definitely or Probably yes.” In comparison, the proportion was much lower at 35% (540/1534) for internet search history, 32% (491/1534) for smartphone data, 32% (488/1534) for sharing wearable device data, and 30% (467/1534) for social media data. Across all participants, only about 3.2% (49/1534) of the participants stated “definitely yes” to share all types of commercial data sets. In comparison, 13.3% (204/1534) of the participants stated “definitely no” to share all types of commercial data sets.
shows the proportion of people who stated “Definitely and Probably yes” for willingness to share different types of commercial data for health research with academic institutions based on the sociodemographic characteristics of the participants. There were significant differences across all types of health research data according to marital status, age, and past experience of data misuse. In contrast, no associations were found between the participants’ educational level and previous participation in the research. Greater GDPR awareness was positively associated with willingness to share all types of data, except for internet searches.
Among these characteristics, notable differences were observed for marital status, where a larger proportion of people who were single reported willingness to share commercial data sets compared with those who were married or in a legal partnership, or widowed, divorced, or separated. Furthermore, an increase in the age of participants was negatively associated with their willingness to share. Less than a fifth of the participants in the 60 to 69 years and above age groups were willing to share smartphone, social media, and wearable device data. Across all types of commercial data, those aged 18 to 29 years had the highest proportion of individuals willing to share at 65.1% (185/284) for loyalty card data, 48.6% (138/284) for smartphone data and wearable devices, 47.2% (134/284) for social media, and 46.8% (133/284) for internet data.
Female participants were less likely to share smartphone data (male: 282/753, 37.5% vs female: 207/777, 26.6%; χ21=20.5 P<.001), wearable devices (male: 280/753, 37.2% vs female: 206/777, 26.5%; χ21=20.0 P<.001), and social media data (male: 271/753, 36% vs female: 196/777, 25.2%; χ21=20.8 P<.001).
In comparison, the differences in proportions were smaller for internet searches (male: 286/753, 38% vs female: 253/777, 32.6%; χ21=4.9, P=.03) and loyalty card data (male: 399/753, 53% vs female: 393/777, 50.6%; χ21=0.8, P=.35). The proportion of people willing to share loyalty card data did not differ by ethnicity or sex. In contrast, the proportion was lower among those from White ethnic backgrounds for internet searches, social media, wearable devices, and smartphone data compared with those who were identified from Black and Asian ethnic backgrounds.
Those who had ever experienced a data misuse event were more likely to share loyalty card data (ever: 344/596, 57.7% vs never: 450/938, 48%; χ21=13.8, P<.001), internet search data (ever: 263/596, 44.1% vs never: 277/938, 29.5%; χ21=34.0, P<.001), smartphone (ever: 245/596, 41.1% vs never: 246/938, 26.2%; χ21=37.0, P<.001), social media (ever: 240/596, 40.3% vs never: 227/938, 24.2%; χ21=44.4, P<.001), and wearable devices (ever: 255/596, 42.8% vs never: 233/938, 24.8%; χ21=54.1, P<.001) than those with no previous data misuse experience.
|Willingness to share||Internet searches||Loyalty card||Smartphone||Social media||Wearable devices|
|Single, n (%)||205 (41.7)||297 (60.4)||190 (38.6)||180 (36.6)||194 (39.4)|
|Married or legal partnership, n (%)||273 (33.5)||406 (49.9)||263 (32.3)||235 (28.9)||250 (30.7)|
|Widowed, divorced, or separated, n (%)||57 (26.9)||87 (41.0)||34 (16.0)||48 (22.6)||41 (19.3)|
|Chi-square (df)||16.4 (2)||25.4 (2)||34.7 (2)||15.7 (2)||28.7 (2)|
|18-29, n (%)||133 (46.8)||185 (65.1)||138 (48.6)||134 (47.2)||138 (48.6)|
|30-39, n (%)||132 (51.8)||156 (61.2)||122 (47.8)||129 (50.6)||141 (55.3)|
|40-49, n (%)||98 (39.2)||134 (53.6)||94 (37.6)||81 (32.4)||87 (34.8)|
|50-59, n (%)||70 (26.3)||129 (48.5)||66 (24.8)||60 (22.6)||57 (21.4)|
|60-69, n (%)||45 (21.1)||91 (42.7)||34 (16.0)||29 (13.6)||29 (13.6)|
|≥70 , n (%)||62 (23.3)||99 (37.2)||37 (13.9)||34 (12.8)||36 (13.5)|
|Chi-square (df)||93.4 (5)||60.3 (5)||140.4 (5)||162.3 (5)||189.4 (5)|
|White, n (%)||447 (33.7)||674 (50.8)||400 (30.2)||377 (28.4)||397 (29.9)|
|Black, n (%)||25 (55.6)||26 (57.8)||22 (48.9)||25 (55.6)||24 (53.3)|
|Asian, n (%)||53 (47.7)||66 (59.5)||50 (45.0)||47 (42.3)||49 (44.1)|
|Mixed, n (%)||9 (28.1)||20 (62.5)||13 (40.6)||13 (40.6)||15 (46.9)|
|Other ethnicities, n (%)||5 (50.0)||6 (60.0)||<5||<5||<5|
|Chi-square (df)||18.7 (4)||5.4 (4)||17.9 (4)||24.8 (4)||23.4 (4)|
|Male, n (%)||286 (38.0)||399 (53.0)||282 (37.5)||271 (36.0)||280 (37.2)|
|Female, n (%)||253 (32.6)||393 (50.6)||207 (26.6)||196 (25.2)||206 (26.5)|
|Chi-square (df)||4.9 (1)||0.8 (1)||20.5 (1)||20.8 (1)||20.0 (1)|
|<Degree and no formal education, n (%)||290 (34.9)||435 (52.4)||252 (30.4)||244 (29.4)||248 (29.9)|
|≥Degree, n (%)||247 (35.4)||355 (50.9)||235 (33.7)||219 (31.4)||238 (34.1)|
|Chi-square (df)||0.4 (1)||0.3 (1)||1.9 (1)||0.7 (1)||3.1 (1)|
|General Data Protection Regulation awareness|
|Not aware, n (%)||52 (29.1)||74 (41.3)||49 (27.4)||39 (21.8)||48 (26.8)|
|Yes, I have heard but do not know much about it, n (%)||159 (37.1)||216 (50.5)||129 (30.1)||136 (31.8)||129 (30.1)|
|Yes, I have heard but know little about it, n (%)||221 (33.6)||351 (53.3)||196 (29.8)||192 (29.2)||197 (29.9)|
|Yes, I have heard and I know a lot about it, n (%)||108 (40.1)||153 (56.9)||117 (43.5)||100 (37.2)||114 (42.4)|
|Chi-square (df)||7.3 (3)||11.5 (3)||20.2 (3)||12.9 (3)||17.5 (3)|
|Previous research participation|
|Yes, n (%)||131 (32.6)||195 (48.5)||122 (30.3)||115 (28.6)||114 (28.4)|
|No, n (%)||409 (36.1)||599 (52.9)||369 (32.6)||352 (31.1)||374 (33.0)|
|Chi-square (df)||1.6 (1)||2.3 (1)||0.6 (1)||0.8 (1)||2.9 (1)|
|Personal experience of data misuse|
|Never, n (%)||277 (29.5)||450 (48.0)||246 (26.2)||227 (24.2)||233 (24.8)|
|Ever, n (%)||263 (44.1)||344 (57.7)||245 (41.1)||240 (40.3)||255 (42.8)|
|Chi-square (df)||34.0 (1)||13.8 (1)||37.0 (1)||44.4 (1)||54.1 (1)|
The results from the ordered logistic regression analyses adjusted for the sociodemographic characteristics of the participants inshow that each point increase in trust in data practices in academia, perceived benefits in participation, and perceived risks of data sharing are positively associated with willingness to share all types of commercial data. In contrast, each point increase in the perceived importance of privacy and worry about data misuse was negatively associated with the willingness to share all types of commercial data.
|Internet search data, aORb (95% CI)||Loyalty card data, aOR (95% CI)||Smartphone data, aOR (95% CI)||Social media data, aOR (95% CI)||Wearable devices data, aOR (95% CI)|
|Trust in data practices in academic institutions||1.097 (1.078-1.117c)||1.103 (1.083-1.123c)||1.097 (1.077-1.117c)||1.087 (1.067-1.107c)||1.075 (1.056-1.095c)|
|Perceived importance of privacy||0.682 (0.625-0.744c)||0.707 (0.648-0.772c)||0.685 (0.628-0.748c)||0.716 (0.656-0.781c)||0.732 (0.670-0.798c)|
|Worry about data misuse||0.960 (0.938-0.982c)||0.975 (0.953-0.997d)||0.960 (0.938-0.982c)||0.963 (0.941-0.985c)||0.940 (0.919-0.962c)|
|Perceived benefit in data sharing and research participation||1.057 (1.029-1.086c)||1.102 (1.072-1.132c)||1.070 (1.040-1.100c)||1.049 (1.020-1.078c)||1.086 (1.056-1.116c)|
|Perceived risks of data sharing||1.222 (1.169-1.276c)||1.114 (1.067-1.163c)||1.224 (1.171-1.279c)||1.208 (1.156-1.263c)||1.232 (1.179-1.287c)|
aAdjusted for age, sex, location, ethnicity, education, General Data Protection Regulation awareness, and past health research participation. The full model with P values is reported in.
baOR: adjusted odds ratio.
Willingness to Take Part in Research Based on Invitation Sources
shows that the most preferred ways of being invited to health research using commercial data were receiving a letter (1069/1534, 69.6%) or an email invitation (1056/1534, 68.6%) from the health care provider, followed by a letter invitation from the government (1041/1534, 67.9%) and universities or publicly funded research institutes (1009/1534, 65.8%). Digital invitations had a much lower preference compared with letter-based invitations, except for letter invitations from high-street retailers to participate in the research (683/1534, 44.5%). Research advertisements on social media or newspapers were preferable for less than half of the participants, with 47.1% (723/1534) and 45.8% (703/1534), respectively. The ordered logistic regression analysis in shows that across all invitation types, greater GDPR awareness was the only predictor of invitation source, and there were some nuanced differences in sociodemographic characteristics.
With an interest to develop a better understanding of psychological factors and sociodemographic characteristics of individuals who would be willing to participate in health research using individual and commercially collected data sets, this study investigates the willingness to share commercial data for health research with different institutions and different types of commercial data in an age-stratified population-based sample in the United Kingdom. Our results showed that two-thirds of the participants were willing to share their commercial data on health research with academic institutions. In contrast, when participants were specifically asked about sharing different types of commercial data for health research with a focus on academic institutions, only about half of them were willing to share their shopping data, and less than one-third were happy to share internet search history, wearable devices, social media, and smartphone apps. Only a small minority of the participants across all outcomes were willing to share their data, highlighting the potential barrier in participant recruitment that needs to be addressed in health research using complex data sets.
Comparison With Prior Work
A key outcome of this study is the validation of the previous evidence that, irrespective of individuals’ GDPR awareness and their sociodemographic characteristics, greater trust is consistently associated with greater willingness to share commercial data for health research . This study also adds to the evidence that greater perceived importance of data privacy and greater worry about data misuse are negatively associated with the willingness to share commercial data for health research. Interestingly, the perceived risk of sharing data was not associated with the willingness to share data with institutions, but it played a role in all types of data that they were willing to share. This is in line with the Nissenbaum [ ] contextual integrity framework for privacy, suggesting that individuals’ information-sharing principles are context dependent and socially constructed. A previous focus group study also found similar results in that participants were more concerned about the use of subjective data sets such as social media posts compared with objective shopping data collected on loyalty cards as more important than the organizations in which they share their data with [ ]. Furthermore, we found a positive univariate association between the participants who had ever experienced a negative data-related event, for example, they might have had a data breach and personal information stolen on the internet and used for other people’s gain, were more likely to share their data. Although we do not have sufficient information to assess why this may be the case, it could be interpreted that those who have experienced a negative event are more likely to be risk-aware than risk-averse. An interesting explanation for this result could be studied further based on “the privacy paradox,” which suggests that intentions to share data may not be directly associated with actual behavior, and other mediators should be better understood [ ]. Future experimental studies with a factorial design can further explore how these experiences impact people’s perceptions and use of technologies.
We identified various sociodemographic factors associated with all outcome variables with the participants’ age particularly being an important factor to be considered for participant recruitment. Our results showed that the participants’ willingness reduced with an increase in age, except for sharing data with government institutions where we observed an inverse association. This is an important outcome to consider when implementing pilot health interventions for the general population . The participants who identified themselves as Black consistently had a higher proportion of willingness not just across all institutions but also for all data types. Health researchers should identify resources to improve the visibility of health research opportunities to improve participation and diversity in research. Recruitment through social media advertisements has been shown to be effective in targeting minority populations [ ]. However, there is a lack of ethical and methodological guidance for recruitment via paid social media advertisements to be carried out effectively [ ].
A key limitation of this survey was that participants were not provided with examples of how commercial data could be used for specific health studies, such as facilitating earlier cancer diagnosis, identifying mental health conditions, and not including nonprofit organizations. A previous study showed that people were more willing to donate their data to Cancer Research UK compared with nonspecific health research organizations, which was found to be associated with individuals’ level of altruism and prosocial tendencies . Owing to the exploratory nature of this study, a priori hypotheses were not included in the statistical analysis plans, and the following warnings are warranted. Although we adjusted for past health research participation, participants from the recruitment panel were subject to desirability bias. The social desirability bias in perceived privacy, intention to share data, and actual behaviors has previously been demonstrated in the privacy paradox [ ]. Future studies arising from this study could investigate mediators between intention and behavior gaps. Similarly, the use of nonprobability sampling of the participants could also lead to greater bias in this study. Although accumulating evidence suggests that the willingness to share commercial data is a complex behavior and cannot be reduced to one-off intention measures, we believe that the outcomes of this study can be used as a guide for identifying populations with the least likelihood of sharing data when recruitment methods for studies requesting data from its participants are operationalized. However, it should be acknowledged that the measures included in this study, as well as sociodemographic characteristics, explain less than 20% of the variance in the willingness to share commercial data ( ). This highlights the complexity of the evidence surrounding data sharing and slow progress in health-related research in relation to building a better understanding of the mechanisms that hinder and facilitate data-sharing principles. Thus, future studies could benefit from the use of a theoretical model, such as the capability, opportunity, motivation, and behavior model [ ], to understand how physical and psychological capabilities, such as participants’ engagement with existing technologies, could potentially moderate their willingness to share data. Similarly, social opportunities could be an important factor for the willingness to participate in research advertised through social media. Notwithstanding, this survey included a UK age-representative cohort based on the 2011 Census [ ], had a larger proportion of individuals from non-White ethnic backgrounds in comparison with other panel-based survey studies with a UK population representative sample [ , ], and also adjusted for the participants’ geographic region in the United Kingdom to improve the external validity and generalizability of its outcomes for the wider UK population.
This survey study demonstrated the public acceptability of sharing commercial data for health research in the United Kingdom with an extensive exploration of people’s knowledge and understanding of what constitutes personal data and GDPR, their willingness to share with different organizations, their willingness to share various types of commercial data, and their willingness to consent and share data if invited through different methods. The outcomes of this study are of interest to be considered in the guidelines and recommendations for public acceptability of data sharing beyond electronic health records and will be useful for developing data stewardship frameworks and initiatives to improve the use of data in the United Kingdom. Where possible, these outcomes can also be used to develop recruitment strategies for research using stratified sampling techniques where it is expected to have low response rates. Future studies using experimental methods are warranted to identify the effectiveness of behavioral science techniques and communication methods to improve the public acceptability of sharing commercial data for health research.
This study was funded by Cancer Research UK Early Detection and Diagnosis project grant (C38463/A26726), with support from the Peter Sowerby Foundation. YH and HRB were funded by Cancer Research UK (C38463/A26726). The authors would like to acknowledge Wenjia Wang, MSc, who worked on this project as part of her MSc Health Psychology dissertation project at University College London.
The anonymized data collected as part of this survey will be made available to researchers upon publication in the Open Science Framework.
YH developed the study concept. YH, LT, and MMR designed and developed the questionnaire. STS and YH performed the data analysis, and YH, LT, MMR, JMF, and HRB informed the interpretation. YH drafted the manuscript, and all authors provided critical revisions. All authors approved the final version of the manuscript for submission.
Conflicts of Interest
(1) The results of the factor analysis; (2) a detailed description of the survey measures including the original questionnaire with item heritage; (3) descriptive data tables for the General Data Protection Regulation awareness in the United Kingdom; (4) ordered logistic regression results on willingness to share commercial data based on invitation sources; and (5) full ordered regression model tables for the primary outcomes.DOCX File , 72 KB
- Mavragani A. Infodemiology and infoveillance: scoping review. J Med Internet Res 2020 Apr 28;22(4):e16206 [FREE Full text] [CrossRef] [Medline]
- Dunn J, Runge R, Snyder M. Wearables and the medical revolution. Per Med 2018 Sep;15(5):429-448 [FREE Full text] [CrossRef] [Medline]
- Ru B, Yao L. A literature review of social media-based data mining for health outcomes research. Social Web Health Res 2019:1-4. [CrossRef]
- Aitken M, de St Jorre J, Pagliari C, Jepson R, Cunningham-Burley S. Public responses to the sharing and linkage of health data for research purposes: a systematic review and thematic synthesis of qualitative studies. BMC Med Ethics 2016 Nov 10;17(1):73 [FREE Full text] [CrossRef] [Medline]
- Davidson S, McLean C, Treanor S, Aitken M, Cunningham-Burley S, Laurie G, et al. Public Acceptability of Data Sharing Between the Public, Private and Third Sectors for Research Purposes. Edinburgh: Scottish Government Social Research; 2013.
- Keusch F, Struminskaya B, Antoun C, Couper MP, Kreuter F. Willingness to participate in passive mobile data collection. Public Opin Q 2019 Jul;83(Suppl 1):210-235 [FREE Full text] [CrossRef] [Medline]
- Paprica PA, McGrail KM, Schull MJ. Notches on the dial: a call to action to develop plain language communication with the public about users and uses of health data. Int J Popul Data Sci 2019 Aug 05;4(1):1106 [FREE Full text] [CrossRef] [Medline]
- Ghafur S, Van Dael J, Leis M, Darzi A, Sheikh A. Public perceptions on data sharing: comparing attitudes in the US and UK. SSRN J 2020. [CrossRef]
- Stockdale J, Cassell J, Ford E. "Giving something back": a systematic review and ethical enquiry into public views on the use of patient data for research in the United Kingdom and the Republic of Ireland. Wellcome Open Res 2018 Jan 17;3:6 [FREE Full text] [CrossRef] [Medline]
- Aitken M, Tully M, Porteous C, Denegri S, Cunningham-Burley S, Banner N, et al. Consensus statement on public involvement and engagement with data intensive health research. Int J Popul Data Sci 2019 Feb 12;4(1):586 [FREE Full text] [CrossRef] [Medline]
- Participatory data stewardship A framework for involving people in the use of data. Ada Lovelace Institute. URL: https://www.adalovelaceinstitute.org/report/participatory-data-stewardship/ [accessed 2023-01-17]
- Peppin A. Who cares what the public think? The Ada Lovelace Institute. 2022 May 5. URL: https://www.adalovelaceinsti tute.org/evidence-review/public-attitudes-data-regulation/ [accessed 2023-01-17]
- Hoofnagle CJ, van der Sloot B, Borgesius FZ. The European Union general data protection regulation: what it is and what it means. Inform Commun Technol Law 2019 Feb 10;28(1):65-98. [CrossRef]
- Green MA, Watson AW, Brunstrom JM, Corfe BM, Johnstone AM, Williams EA, et al. Comparing supermarket loyalty card data with traditional diet survey data for understanding how protein is purchased and consumed in older adults for the UK, 2014-16. Nutr J 2020 Aug 13;19(1):83 [FREE Full text] [CrossRef] [Medline]
- Flanagan JM, Skrobanski H, Shi X, Hirst Y. Self-care behaviors of ovarian cancer patients before their diagnosis: proof-of-concept study. JMIR Cancer 2019 Jan 17;5(1):e10447 [FREE Full text] [CrossRef] [Medline]
- Brewer HR, Hirst Y, Sundar S, Chadeau-Hyam M, Flanagan JM. Cancer Loyalty Card Study (CLOCS): protocol for an observational case-control study focusing on the patient interval in ovarian cancer diagnosis. BMJ Open 2020 Sep 08;10(9):e037459 [FREE Full text] [CrossRef] [Medline]
- Huang F, Blaschke S, Lucas H. Beyond pilotitis: taking digital health interventions to the national level in China and Uganda. Global Health 2017 Jul 31;13(1):49 [FREE Full text] [CrossRef] [Medline]
- Dolan EH, Shiells K, Goulding J, Skatova A. Public attitudes towards sharing loyalty card data for academic health research: a qualitative study. BMC Med Ethics 2022 Jun 07;23(1):58 [FREE Full text] [CrossRef] [Medline]
- Shiells K, Di Cara N, Skatova A, Davis O, Haworth C, Skinner A, et al. Participant acceptability of digital footprint data collection strategies: an exemplar approach to participant engagement and involvement in the ALSPAC birth cohort study. IJPDS Special Issue Public Involve Engage 2020;5(3). [CrossRef]
- Wenz A, Jäckle A, Couper M. Willingness to use mobile technologies for data collection in a probability household panel. Survey Res Method 2019;13(1):1-22 [FREE Full text]
- Struminskaya B, Toepoel V, Lugtig P, Haan M, Luiten A, Schouten B. Understanding willingness to share smartphone-sensor data. Public Opin Q 2020;84(3):725-759 [FREE Full text] [CrossRef] [Medline]
- Ságvári B, Gulyás A, Koltai J. Attitudes towards participation in a passive data collection experiment. Sensors (Basel) 2021 Sep 10;21(18):6085 [FREE Full text] [CrossRef] [Medline]
- Keusch F, Bähr S, Haas G, Kreuter F, Trappmann M, Eckman S. Non‐participation in smartphone data collection using research apps. Royal Stats Soc Series A 2022 Apr 12;185(S2) [FREE Full text] [CrossRef]
- Overview – Data Protection and the EU. Information Commissioner's Office. URL: https://ico.org.uk/for-organisations/dp-at-the-end-of-the-transition-period/overview-data-protection-and-the-eu/ [accessed 2023-01-17]
- Osborne JW, Costello AB. Sample size and subject to item ratio in principal components analysis. Practical Assess Res Eval 2019;9(11) [FREE Full text] [CrossRef]
- Crofts S. Population statistics research update: June 2018. Office of National Statistics. 2018 Jun 22. URL: https://tinyurl.com/2pc6fdej [accessed 2023-01-17]
- von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP, STROBE Initiative. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet 2007 Oct 20;370(9596):1453-1457 [FREE Full text] [CrossRef] [Medline]
- Stoffel ST, Hirst Y, Ghanouni A, McGregor LM, Kerrison R, Verstraete W, et al. Testing active choice for screening practitioner's gender in endoscopy among disinclined women: an online experiment. J Med Screen 2019 Jun 14;26(2):98-103 [FREE Full text] [CrossRef] [Medline]
- Nissenbaum H. Privacy as contextual integrity. Wash Law Rev 2004;79(1).
- Norberg P, Horne DR, Horne. The privacy paradox: personal information disclosure intentions versus behaviors. J Consum Affair 2007;41(1):100-126 [FREE Full text] [CrossRef]
- Myers KJ, Jaffe T, Kanda DA, Pankratz VS, Tawfik B, Wu E, et al. Reaching the "Hard-to-reach" sexual and gender diverse communities for population-based research in cancer prevention and control: methods for online survey data collection and management. Front Oncol 2022;12:841951 [FREE Full text] [CrossRef] [Medline]
- Russomanno J, Patterson JG, Jabson Tree JM. Social media recruitment of marginalized, hard-to-reach populations: development of recruitment and monitoring guidelines. JMIR Public Health Surveill 2019 Dec 02;5(4):e14886 [FREE Full text] [CrossRef] [Medline]
- Skatova A, Goulding J. Psychology of personal data donation. PLoS One 2019 Nov 20;14(11):e0224240 [FREE Full text] [CrossRef] [Medline]
- West R, Michie S. A brief introduction to the COM-B Model of behaviour and the PRIME Theory of motivation. Qeios 2020 Apr 07. [CrossRef]
- Chorley AJ, Hirst Y, Vrinten C, von Wagner C, Wardle J, Waller J. Public understanding of the purpose of cancer screening: a population-based survey. J Med Screen 2018 Jun 22;25(2):64-69 [FREE Full text] [CrossRef] [Medline]
- Connor K, Hudson B, Power E. Awareness of the signs, symptoms, and risk factors of cancer and the barriers to seeking help in the UK: comparison of survey data collected online and face-to-face. JMIR Cancer 2020 Jan 17;6(1):e14539 [FREE Full text] [CrossRef] [Medline]
|aOR: adjusted odds ratio|
|GDPR: General Data Protection Regulation|
|PCA: principal component analysis|
Edited by A Mavragani, T Sanchez; submitted 07.07.22; peer-reviewed by H Huang, R Bach; comments to author 18.10.22; revised version received 16.01.23; accepted 19.01.23; published 23.03.23Copyright
©Yasemin Hirst, Sandro T Stoffel, Hannah R Brewer, Lada Timotijevic, Monique M Raats, James M Flanagan. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 23.03.2023.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included.