This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.
Determining patterns of physical activity throughout the day could assist in developing more personalized interventions or physical activity guidelines in general and, in particular, for women who are less likely to be physically active than men.
The aims of this report are to identify clusters of women based on accelerometer-measured baseline raw metabolic equivalent of task (MET) values and a normalized version of the METs ≥3 data, and to compare sociodemographic and cardiometabolic risks among these identified clusters.
A total of 215 women who were enrolled in the Mobile Phone Based Physical Activity Education (mPED) trial and wore an accelerometer for at least 8 hours per day for the 7 days prior to the randomization visit were analyzed. The k-means clustering method and the Lloyd algorithm were used on the data. We used the elbow method to choose the number of clusters, looking at the percentage of variance explained as a function of the number of clusters.
The results of the k-means cluster analyses of raw METs revealed three different clusters. The unengaged group (n=102) had the highest depressive symptoms score compared with the afternoon engaged (n=65) and morning engaged (n=48) groups (overall
Categorizing physically inactive individuals into more specific activity patterns could aid in creating timing, frequency, duration, and intensity of physical activity interventions for women. Further research is needed to confirm these cluster groups using a large national dataset.
ClinicalTrials.gov NCT01280812; https://clinicaltrials.gov/ct2/show/NCT01280812 (Archived by WebCite at http://www.webcitation.org/6vVyLzwft)
Increasing physical activity is associated with a reduction in chronic illnesses and an increase in psychological well-being [
Recent investigations have shown that there is a large discrepancy between self-reported and objectively measured moderate-to-vigorous physical activity (MVPA) [
Cluster analysis is a useful statistical technique that can allocate observations/individuals into groups based on similar characteristics [
Our research team had a unique opportunity to analyze seven consecutive days of accelerometer data in women who were screened and completed the run-in period of the Mobile Phone Based Physical Activity Education (mPED) randomized controlled trial (RCT). To our knowledge, no study has used cluster analyses to explore daily patterns of physical activity using seven consecutive days of accelerometer data in female adults. The aims of this paper are (1) to identify clusters of women who enrolled in the mPED study based on overall accelerometer-measured baseline physical activity and MVPA and (2) to compare sociodemographic and cardiometabolic risks among these identified clusters.
The mPED study is a RCT of the app-based physical activity intervention in physically inactive women (trial registration: ClinicalTrials.gov NCT01280812). Detailed descriptions of the study design and study protocol have been published previously [
Initial inclusion criteria for the mPED trial were (1) physically inactive at work and/or during leisure time based on the Stanford Brief Activity Survey [
In total, 318 women came in for a screening/baseline visit. Of those, 57 did not start or complete the run-in period and 46 did not have sufficient accelerometer wear time of at least 8 hours per day for the last 7 days prior to the randomization visit. The remaining 215 participants were analyzed in this report.
A triaxial accelerometer (HJA-350IT, Active Style Pro, Omron Healthcare Co, Ltd) was used to assess objectively measured physical activity [
The Center for Epidemiological Studies Depression Scale (CES-D) is a 20-item questionnaire widely used for assessing symptoms of depression [
The k-means clustering method (hereafter referred to as k-means) [
To apply k-means, we used the Lloyd algorithm [
The k-means clustering was applied to raw METs data from each enrolled participant to evaluate if raw minute-level METs were able to classify participants by physical activity and time to do physical activity. All observations including day and night were included because participants engaged in activity at various time points. Thus, naively removing night data would lead to a loss of information. Specifically, the features for each individual consisted of a 10,080-dimensional vector comprised of consecutive (at the minute interval) METs observations for 7 days. Missing data occurred mainly during nighttime and hence were simply replaced by 1, which is the METs reading for a stationary individual.
Equations 1-3.
We also explored how MVPA (METs≥3) were associated with sociodemographic data and clinical outcomes. Thus, k-means clustering was applied to a normalized version of the METs data from each enrolled participant, and the data were normalized as follows: suppose for each participant
Finally, we normalized this vector for each individual by time to have unit Euclidean norm (see Equation 3 in
Overall, the mean age of participants was 52.4 (SD 11.2) years, 54.4% (117/215) were white, 48.8% (105/215) were single or divorced, and 73.0% (157/215) were well educated, reporting college- or graduate-level educations. In addition, 49.3% (106/215) had used a pedometer and 57.2% (123/215) had participated in a diet/weight loss plan prior to study enrollment. The majority of the sample (80.5%, 173/215) drove a car at least once per week.
The k-mean clustering separated the participants into three groups (
A k-means cluster analysis of raw METs (metabolic equivalent of tasks) data with the Lloyd algorithm (N=215) for physical activity frequency during the day for each cluster.
Comparison of sociodemographic and clinical characteristics among the three clustered groups based on raw metabolic equivalent of tasks (METs) data (N=215).
Sociodemographics and clinical characteristics | Afternoon engaged group |
Morning engaged group |
Unengaged group |
Overall |
||||||
Age (years), mean (SD) | 52.2 (11.3) | 53.7 (9.8) | 51.8 (11.8) | .64 | ||||||
.009b | ||||||||||
High school/some college | 21 (32.3) | 19 (39.6) | 18 (17.6) | |||||||
College/Graduate school | 44 (67.7) | 29 (60.4) | 84 (82.4) | |||||||
.15 | ||||||||||
White | 32 (49.2) | 22 (45.8) | 63 (61.8) | |||||||
Asian and Pacific Islander | 19 (29.2) | 15 (31.3) | 16 (15.7) | |||||||
Nonwhite and Multiracial | 14 (21.5) | 11 (22.9) | 23 (22.5) | |||||||
Marital status (married/cohabitating), n (%) | 39 (60.0) | 25 (52.1) | 46 (45.1) | .17 | ||||||
.19 | ||||||||||
Paid full time/part time | 42 (64.6) | 38 (79.2) | 76 (74.5) | |||||||
Homemaker/retried/disabled | 23 (35.4) | 10 (20.8) | 26 (25.5) | |||||||
Previous pedometer usage, n (%) | 34 (52.3) | 24 (50.0) | 48 (47.1) | .80 | ||||||
Drives a car at least once a week, n (%) | 57 (87.7) | 37 (77.1) | 79 (77.5) | .21 | ||||||
Has a dog, n (%) | 12 (18.5) | 12 (25.0) | 16 (15.7) | .39 | ||||||
Participated in diet plan prior to the study, n (%) | 38 (58.5) | 25 (52.1) | 60 (58.8) | .72 | ||||||
Has a gym membership, n (%) | 14 (21.5) | 11 (22.9) | 38 (37.3) | .05 | ||||||
Weekly total minutes of MVPAc by accelerometer with 1 minute criteria | 372.1 (137.5) | 401.5 (132.8) | 205.7 (92.6) | <.001d | ||||||
Weekly total minutes of MVPA by accelerometer with 5 minutes criteria | 93.2 (90.8) | 119.8 (113.1) | 63.3(62.1) | .001e | ||||||
Weekly total minutes of MVPA by accelerometer with 10 minutes criteria (a 1 or 2 minutes interruption allows) | 57.3 (73.3) | 78.1 (105.1) | 37.8 (47.8) | .006f | ||||||
Average daily steps | 6436.1 (2216.9) | 6722.9 (1718.9) | 4796.9 (1723.9) | <.001g | ||||||
Weekly total hours of TV watching and computer usage time | 25.5 (17.6) | 23.3 (16.8) | 30.3 (18.3) | .048 | ||||||
Physical Component score | 52.0 (6.0) | 51.7 (6.1) | 50.9 (6.6) | .53 | ||||||
Mental Component score | 49.2 (8.6) | 50.5 (9.6) | 46.6 (10.3) | .04 | ||||||
Total CESDi score | 7.5 (7.1) | 8.1 (6.5) | 11.9 (8.9) | <.001j | ||||||
Total self-efficacy for physical activity score | 18.3 (4.4) | 19.6 (4.2) | 19.1 (5.0) | .30 | ||||||
Total family score | 31.1 (9.2) | 32 (7.7) | 30.9 (10.2) | .82 | ||||||
Total friends score | 30.8 (7.5) | 30.9 (8.1) | 32.3 (9.1) | .46 | ||||||
Total barriers to being active score | 24.0 (9.5) | 22.2 (10.0) | 23.6 (10.2) | .91 | ||||||
Body mass index (kg/m2) | 29.3 (6.5) | 28.1 (5.3) | 29.9 (6.1) | .23 | ||||||
Waist circumference (cm) | 96.4 (14.4) | 94.4 (13.0) | 99.0 (14.8) | .17 | ||||||
Hip circumference (cm) | 109.7 (13.5) | 107.2 (12.6) | 112.3 (13.9) | .09 | ||||||
Resting systolic blood pressure (mm Hg) | 121.1 (14.0) | 118.8 (13.4) | 121.5 (14.7) | .56 | ||||||
Resting diastolic blood pressure (mm Hg) | 76.1 (10.1) | 74.7 (8.9) | 78.5 (9.9) | .07 | ||||||
Cholesterol, total (mg/dL) | 205.5 (35.2) | 199.9 (41.4) | 203.4 (40.9) | .76 | ||||||
Triglycerides (mg/dL) | 117.6 (54.3) | 103.9 (46.9) | 117.0 (53.4) | .31 | ||||||
LDLk (mg/dL) | 119.5 (33.0) | 115.4 (35.7) | 118.8 (33.8) | .80 | ||||||
HDLl (mg/dL) | 62.6 (15.8) | 63.7(18.8) | 61.2 (16.5) | .68 | ||||||
HbA1c (%) | 5.8 (0.56) | 5.7 (0.4) | 5.7 (0.5) | .34 | ||||||
Fasting plasma glucose (mg/dL) | 95.1 (17.1) | 95.5 (12.2) | 97.0 (16.0) | .71 |
aIf overall
bPairwise comparison between Morning engaged and Unengaged in those with High school/some college:
cMVPA: moderate-to-vigorous physical activity.
dPairwise comparison between Afternoon engaged and Unengaged:
ePairwise comparison between Morning engaged and Unengaged:
fPairwise comparison between Between Morning engaged and Unengaged:
gPairwise comparison between Afternoon engaged and Unengaged:
hSF-12: 12-item Short-Form Health Survey.
iCES-D: Center for Epidemiological Studies Depression Scale.
jPairwise comparison between Afternoon engaged and Morning engaged:
kLDL: low-density lipoprotein.
lHDL: high-density lipoprotein.
In
The k-mean clustering separated the participants into three groups (
A k-means cluster analysis of normalized METs (metabolic equivalent of tasks) ≥3 data (N=215). MVPA: moderate-to-vigorous physical activity.
Comparison of sociodemographic and clinical characteristics among three clustered groups based on normalized METs ≥3 data (N=215).
Sociodemographic and clinical characteristics | MVPAa Morning and |
MVPA Noon peak |
MVPA Evening peak |
Overall |
||||||
Age (years), mean (SD) | 51.9 (11.7) | 52.4 (10.5) | 52.6 (11.5) | .94 | ||||||
.81 | ||||||||||
High school/some college | 11 (23.9) | 18 (29.5) | 29 (26.9) | |||||||
College/Graduate school | 35 (76.1) | 43 (70.5) | 79 (73.1) | |||||||
.20 | ||||||||||
White | 20 (43.5) | 31 (50.8) | 66 (61.1) | |||||||
Asian and Pacific Islander | 12 (26.1) | 18 (29.5) | 20 (18.5) | |||||||
Nonwhite and multiracial | 14 (30.4) | 12 (19.7) | 22 (20.4) | |||||||
Marital status (Married/cohabitating), n (%) | 23 (50.0) | 36 (59.0) | 51 (47.2) | .33 | ||||||
.11 | ||||||||||
Paid full time/part time | 39 (84.8) | 43 (70.5) | 74 (68.5) | |||||||
Homemaker/retried/disabled | 7 (15.2) | 18 (29.5) | 34 (31.5) | |||||||
Previous pedometer usage, n (%) | 25 (54.3) | 29 (47.5) | 52 (48.1) | .74 | ||||||
Drives a car at least once a week, n (%) | 38 (82.6) | 48 (78.7) | 87 (80.6) | .88 | ||||||
Has a dog, n (%) | 9 (19.6) | 13 (21.3) | 18 (16.7) | .74 | ||||||
Participated in diet plan prior to the study, n (%) | 26 (56.5) | 31 (50.8) | 66 (61.1) | .43 | ||||||
Has a gym membership, n (%) | 18 (39.1) | 13 (21.3) | 32 (29.6) | .13 | ||||||
Weekly total minutes of MVPA by accelerometer with 1 minute criteria | 287.7 (121.1) | 388.4 (168.9) | 254.7 (121.2) | <.001c | ||||||
Weekly total minutes of MVPA by accelerometer with 5 minutes criteria | 70.7 (65.0) | 120.1 (113.2) | 71.1 (72.6) | .001d | ||||||
Weekly total minutes of MVPA by accelerometer with 10 minutes criteria (a 1-or 2 minutes interruption allows) | 41.0 (48.8) | 81.3 (101.8) | 41.5 (57.2) | .001e | ||||||
Average daily steps | 5754.0 (1560.4) | 6663.7 (2359.9) | 5211.7 (1936.8) | <.001f | ||||||
Weekly total hours of TV watching and computer usage time | 28.7 (19.5) | 27.7 (19.5) | 26.5 (16.4) | .76 | ||||||
Physical Component score | 52.3 (5.9) | 50.9 (7.1) | 51.3 (6.0) | .51 | ||||||
Mental Component score | 49.3 (7.6) | 47.7 (11.4) | 48.1 (9.5) | .69 | ||||||
Total CES-Dh score | 9.5 (8.6) | 9.9 (8.3) | 9.8 (7.9) | .97 | ||||||
Total self-efficacy for physical activity score | 20.0 (4.5) | 18.5 (4.3) | 18.8 (4.8) | .23 | ||||||
Total family score | 31.1 (8.2) | 32.6 (8.8) | 30.5 (10.1) | .36 | ||||||
Total friends score | 29.8 (7.8) | 31.3 (7.7) | 32.5 (8.9) | .20 | ||||||
Total barriers to being active score | 21.3 (10.0) | 25.6 (9.1) | 23.5 (10.2) | .08 | ||||||
Body mass index (kg/m2) | 29.6 (6.1) | 27.6 (5.5) | 30.2 (6.2) | .03i | ||||||
Waist circumference (cm) | 95.9 (14.2) | 93.6 (12.8) | 99.7 (14.9) | .02j | ||||||
Hip circumference (cm) | 111.3 (13.3) | 106.5 (12.6) | 112.2 (13.9) | .03k | ||||||
Resting systolic blood pressure (mm Hg) | 120.0 (14.1) | 119.4 (13.7) | 121.7 (14.6) | .57 | ||||||
Resting diastolic blood pressure (mm Hg) | 75.3 (10.7) | 75.2 (8.7) | 78.6 (9.9) | .04 | ||||||
Cholesterol, total (mg/dL) | 198.6 (41.0) | 203.9 (32.8) | 204.9 (41.8) | .66 | ||||||
Triglycerides (mg/dL) | 110.4 (54.9) | 117.8 (54.8) | 113.9 (50.2) | .77 | ||||||
LDLl (mg/dL) | 113.7 (36.1) | 118.4 (28.6) | 120.1 (35.7) | .57 | ||||||
HDLm (mg/dL) | 62.9 (18.1) | 62.0 (15.6) | 62.0 (17.0) | .95 | ||||||
HbA1c (%) | 5.8 (0.5) | 5.8 (0.6) | 5.7 (0.5) | .68 | ||||||
Fasting plasma glucose (mg/dL) | 95.3 (12.3) | 98.2 (17.2) | 95.2 (15.8) | .46 |
aMVPA: moderate-to-vigorous physical activity.
bIf overall
cPairwise comparison between Morning and evening peak and Noon peak:
dPairwise comparison between Morning and evening peak and Noon peak:
ePairwise comparison between Morning and evening peak and Noon peak:
fPairwise comparison between Noon peak and Evening peak:
gSF-12: 12-item Short-Form Health Survey.
hCES-D: Center for Epidemiological Studies Depression Scale.
iPairwise comparison between Noon peak and Evening peak:
jPairwise comparison between Noon peak and Evening peak:
kPairwise comparison between Noon peak and Evening peak:
lLDL: low-density lipoprotein.
mHDL: high-density lipoprotein.
This study is the first to identify clusters of women aged between 25 and 69 years based on seven consecutive days of accelerometer-measured METs and MVPA (≥3 METs). This first cluster analysis successfully identified three groups based on accelerometer-measured METs. It appears that only the difference between the afternoon engaged and the morning engaged groups is timing of activity throughout the day. However, the unengaged group (representing 47.4% of the sample) had a much lower activity level than the other two groups.
We found that the unengaged group was more likely to have a college or graduate degree compared to the afternoon engaged and morning engaged groups. In the cluster analysis study of self-reported physical activity involving 3324 individuals in France, Omorou et al [
Furthermore, consistent with previous study findings [
The second cluster analysis based on MVPA (normalized 3 ≥METs data) also showed three distinct groups (MVPA morning and evening peak, MVPA noon peak, and MVPA evening peak). A two-peak pattern of MVPA (7-8 am and 5-6 pm) in the MVPA morning and evening peak group might be explained by active commuting. The MVPA noon peak group appeared to have the greatest duration of MVPA compared with the other two groups. Moreover, this MVPA noon peak group had significantly lower metabolic risks (BMI, hip and waist circumferences) than the MVPA evening peak group. In a recent large epidemiologic study, the investigators also reported that bouts of 10 minutes or more of MVPA (as per current guidelines) and even bouts of less than 10 minutes were associated with lower levels of adiposity and a lower risk of metabolic syndrome in older adults [
The strengths of this study were that we were able to use seven consecutive days of accelerometer-measured physical activity data instead of depending on participant recall to collect the vast majority of types of activities (active transportation, occupational and leisure activity), and to identify physical activity patterns that were specific to certain times of the day. In addition, the participants were not able to view their steps taken and intensity of physical activity during the data collection period; thus, this blinding function helped prevent participants from modifying their daily activity. Despite these strengths, some limitations need to be taken into account. First, the findings of this study might not be generalizable to men or children. Men tend to be more active than women are across their life spans. Second, in general, individuals with high levels of depressive symptoms are less likely to be enrolled in clinical studies compared to those with low symptoms. The proportion of the unengaged group could be larger than this data. Lastly, the accelerometer used in the mPED trial was not able to capture activities such as swimming, bicycling, and weight lifting. However, women who engaged in these activities in the mPED trial were relatively low in this sample [
Despite the use of objectively measured physical activity, the sample size was relatively small in this study. Thus, these identified cluster groups need to be cross-validated using a large national dataset such as the National Health and Nutrition Examination Survey.
Classifying physically inactive individuals into more precise activity patterns could assist in tailoring the timing, frequency, duration, and intensity of physical activity interventions for women. For example, recommending bouts of physical activity before noon to the unengaged group or MVPA evening peak group may lead to an increase in their activity levels. Future research should consider examining how different types of baseline physical activity cluster groups will respond to different types of physical activity interventions.
Result of the Elbow Method for the raw METs data.
Result of the Elbow Method for the normalized METs ≥3 data.
body mass index
Center for Epidemiological Studies Depression Scale
high-density lipoprotein
low-density lipoprotein
metabolic equivalent of task
Mobile Phone Based Physical Activity Education
moderate-to-vigorous physical activity
randomized controlled trial
12-item Short-Form Health Survey
This project was supported by a grant (R01HL104147) from the National Heart, Lung, and Blood Institute; by the American Heart Association; and by a grant (K24NR015812) from the National Institute of Nursing Research. MZ and AA were supported in part by a grant (CMMI-1450963) from the National Science Foundation, and MZ and KG were supported in part by funding from Fujitsu Research Labs, the UC Center for Information Technology Research in the Interest of Society (CITRIS), and the Philippine-California Advanced Research Institutes (PCARI). The study sponsors had no role in the study design; collection, analysis, or interpretation of data; writing the report; or the decision to submit the report for publication.
None declared.