This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR mhealth and uhealth, is properly cited. The complete bibliographic information, a link to the original publication on http://mhealth.jmir.org/, as well as this copyright and license information must be included.
Only one in five American meets the physical activity recommendations of the Department of Health and Human Services. The proliferation of wearable devices and smartphones for physical activity tracking has led to an increasing number of interventions designed to facilitate regular physical activity, in particular to address the obesity epidemic, but also for cardiovascular disease patients, cancer survivors, and older adults. However, the inconsistent findings pertaining to the accuracy of wearable devices for step counting needs to be addressed, as well as factors known to affect gait (and thus potentially impact accuracy) such as age, body mass index (BMI), or leading arm.
We aim to assess the accuracy of recent mobile devices for counting steps, across three different age groups.
We recruited 60 participants in three age groups: 18-39 years, 40-64 years, and 65-84 years, who completed two separate 1000 step walks on a treadmill at a self-selected speed between 2 and 3 miles per hour. We tested two smartphones attached on each side of the waist, and five wrist-based devices worn on both wrists (2 devices on one wrist and 3 devices on the other), as well as the Actigraph wGT3X-BT, and swapped sides between each walk. All devices were swapped dominant-to-nondominant side and vice-versa between the two 1000 step walks. The number of steps was recorded with a tally counter. Age, sex, height, weight, and dominant hand were self-reported by each participant.
Among the 60 participants, 36 were female (60%) and 54 were right-handed (90%). Median age was 53 years (min=19, max=83), median BMI was 24.1 (min=18.4, max=39.6). There was no significant difference in left- and right-hand step counts by device. Our analyses show that the Fitbit Surge significantly undercounted steps across all age groups. Samsung Gear S2 significantly undercounted steps only for participants among the 40-64 year age group. Finally, the Nexus 6P significantly undercounted steps for the group ranging from 65-84 years.
Our analysis shows that apart from the Fitbit Surge, most of the recent mobile devices we tested do not overcount or undercount steps in the 18-39-year-old age group, however some devices undercount steps in older age groups. This finding suggests that accuracy in step counting may be an issue with some popular wearable devices, and that age may be a factor in undercounting. These results are particularly important for clinical interventions using such devices and other activity trackers, in particular to balance energy requirements with energy expenditure in the context of a weight loss intervention program.
Obesity is a major health concern in the United States, with estimates of overweight or obese Americans >20 years old ranging between 68.5-75.3% [
There is mounting evidence that mobile health strategies and wearable devices could improve health behavior interventions, in particular for chronic conditions across the socioeconomic gradient [
The discrepancies between such studies suggest that it is useful to assess what potential variables affect step count. It is not known whether user characteristics such as weight, height, gender, or age affect the accuracy of step counting for such tools. Age is a particularly interesting variable, given the evidence on gait changes among older adults [
The purpose of this paper is to address this gap in the current literature for a representative set of five wrist-worn devices (Apple Watch, Samsung Gear S2, Garmin 735XT, Garmin Vivofit, Fitbit Surge), two smartphones (iPhone 6s Plus, Nexus 6P) and the research-grade ActiGraph wGT3X-BT. This selection was made to reflect the two most common mobile operating systems (OSs; namely Android and iOS), the range of price points, and the most commonly purchased device brands (Fitbit, Garmin) available on the market. To this effect, we model and assess the accuracy of recent smartphones and wearable devices across three age groups.
As of 2016, there are an estimated 394 wearable devices from 266 companies that are capable of activity tracking [
After receiving approval from the University of Florida Institutional Review Board (IRB201601145), we recruited participants using flyers that were disseminated across campus. Twenty participants were recruited in each of the following age groups: 18-39 years, 40-64 years, and 65-84 years, for a total of 60 participants. Subjects were recruited among people without a contraindication to exercise, and who were able to walk comfortably on a treadmill for 20 minutes at a speed between 2 and 3 miles per hour.
The purpose and the protocol of the study were explained to participants, who were then consented by the study team (AL, MDS). Each participant received a US $10 gift card for participating in the study, and were instructed that they would be asked to do two walks of 1000 steps on a treadmill, at a self-selected speed between 2 and 3 miles per hour. Participants were instructed that the treadmill would be started at 2 miles per hour, upon which they would start walking without holding onto the treadmill, and steps would be recorded. The speed was progressively increased to an acceptable level by the study team (AL, MDS), as instructed by the participant. After consent, participants self-reported sex, age, height, weight, and dominant hand. In the first 1000-step walk, the Fitbit Surge, Garmin Vivofit, and Apple Watch were attached to the right wrist of the participants, and the Samsung Gear S2 and Garmin 735XT were attached to the left wrist. This choice was dictated by the width of each device. The iPhone 6S Plus was attached to the right hip with a belt clip, and the Nexus 6P was affixed to the left hip. Devices were then swapped right to left and vice versa in the second 1000-step walk. The Actigraph wGT3X-BT was kept centered at the back of the waist during both walks. The number of steps were tallied with a manual tally counter by one of the team members (AL, MDS). The number of steps for each device was recorded at the end of each walk. Additionally, the Apple Watch and the Samsung Gear S2 were not synchronized to their respective smartphones (iPhone 6S Plus and the Nexus 6P) to ensure reliability of the data.
We computed summary statistics for the participants’ characteristics. To estimate the counted steps from each device while controlling for correlated observations and covariates, we fitted a repeated measures mixed-effects model, in which the participant was the independent sampling unit. The outcome of the model was steps counted by the devices (ie, the smartphones, the actigraph, or the wrist-based devices); the distribution of this outcome was not skewed. The predictor variables in the full model included age, sex, body mass index (BMI), dominant hand, device, age-by-variable interactions, and device-by-variable interactions. Age-by-variable interactions included age-by-sex, age-by-BMI, age-by-dominant hand, and age-by-device. Similarly, device-by-variable interactions included device-by-sex, device-by-BMI, and device-by-dominant hand. The order of the predictors was fixed in the order listed above. An unstructured covariance model was assumed, which accounted for unequal variance across devices. We used a backwards selection strategy [
In the model, age was categorized as: 18-39 years old, 40-64 years old, and 65-84 years old. Our preliminary analysis revealed that there was no significant difference in left- and right-hand step counts for each device. Therefore, we averaged the measurements obtained from the two walks for each participant-by-device for modeling. In addition, we set a cutoff of 250 steps as a likely point of device failure (less than 1 out of every 4 steps counted). All step outcomes less than 250 were excluded from the model. We chose to use BMI as a predictor in place of height and weight, as these two variables were highly correlated and would introduce collinearity to the model. We conducted all analyses using SAS 9.4 (SAS Institute, Cary, NC).
We summarized the characteristics of the study participants in
Participant characteristics. One BMI observation was missing.
Characteristics | Total (N=60) | Age 18-39 (n=21) | Age 40-64 (n=20) | Age 65-84 (n=19) | |
Age, mean (SD) | 49.5 (19.4) | 26.2 (5.1) | 53.7 (7.0) | 70.9 (4.3) | |
Female | 36 (60.0) | 11 (52.4) | 11 (55.0) | 14 (73.7) | |
Male | 24 (40.0) | 10 (47.6) | 9 (45.0) | 5 (26.3) | |
Right | 54 (90.0) | 20 (95.2) | 19 (95.0) | 15 (79.0) | |
Left | 6 (10.0) | 1 (4.8) | 1 (5.0) | 4 (21.0) | |
BMI, mean (SD) | 25.2 (4.6) | 23.0 (2.7) | 25.7 (4.6) | 27.0 (5.4) |
Steps by device by age group (averaged across two measurements).
Device | Age | |||||||
18-39 (n=21) | 40-64 (n=20) | 65-84 (n=19) | ||||||
Mean | SD | Mean | SD | Mean | SD | |||
Actigraph | 1003.9 | 6.3 | 986.1 | 44.4 | 995.0 | 43.4 | ||
Apple Watch | 964.9 | 59.0 | 970.9 | 38.4 | 1015.9 | 118.8 | ||
Fitbit Surge | 959.7 | 57.6 | 943.9 | 81.5 | 945.6 | 85.5 | ||
Garmin 735XT | 987.8 | 37.9 | 994.0 | 29.4 | 978.7 | 38.2 | ||
Garmin Vivofit | 994.5 | 11.5 | 992.3 | 22.6 | 953.5 | 170.3 | ||
iPhone 6S Plus | 1021.2 | 186.4 | 1035.0 | 129.6 | 1018.0 | 56.9 | ||
Nexus 6P | 997.1 | 41.3 | 988.3 | 26.1 | 900.9 | 158.3 | ||
Samsung Gear S2 | 988.0 | 16.0 | 959.4 | 84.2 | 969.1 | 47.9 |
Final reduced model type 3 fixed effects.
Effect | Numerator Degrees of Freedom | Denominator Degrees of Freedom | F value | |
Age group | 2 | 55.8 | 0.79 | 0.4599 |
Sex | 1 | 48 | 0.05 | 0.8164 |
BMI | 1 | 47 | 2.30 | 0.1358 |
Dominant hand | 1 | 54.4 | 2.10 | 0.1532 |
Device | 7 | 49.5 | 3.56 | 0.0036 |
Age group x device | 14 | 76.3 | 2.00 | 0.0287 |
Predicted means of steps for each device by age (adjusted for BMI, dominant hand, and sex).
Device | Age 18-39 | Age 40-64 | Age 65-84 |
Actigraph, mean (CI) | 1008.7 (989.2, 1028.2) | 997.2 (976.8, 1017.6) | 1002.7(984.4, 1021.1) |
Apple Watch, mean (CI) | 970.2 (934.4, 1006.0) | 980.1 (942.9, 1017.4) | 1023.6 (987.0, 1060.2) |
Fitbit Surge, mean (CI) | 965.0 (930.0, 999.9) | 950.8 (913.7, 988.0) | 953.3 (917.5, 989.0) |
Garmin 735XT, mean (CI) | 993.9 (973.5, 1014.4) | 1003.3 (983.5, 1023.2) | 986.4 (967.7, 1005.0) |
Garmin Vivofit, mean (CI) | 999.8 (956.3, 1043.4) | 1002.9 (957.4, 1048.3) | 961.2 (916.3, 1006.2) |
iPhone 6S Plus, mean (CI) | 1026.5 (964.7, 1088.2) | 1045.1 (980.4, 1109.8) | 1025.7 (961.4, 1090.1) |
Nexus 6P, mean (CI) | 1002.4 (956.2, 1048.6) | 981.8 (932.8, 1030.9) | 908.6 (860.7, 956.4) |
Samsung Gear S2, mean (CI) | 993.3 (966.6, 1020.1) | 966.7 (939.2, 994.3) | 976.8 (950.1, 1003.6) |
We summarized the step counting characteristics of the study devices in
We summarized the results from the mixed-effects models in
Based on the final model, we produced model-based estimates of the steps counted by each device stratified by age group (
The steps counted by the Fitbit Surge for the 18-39 age group were 965.0 (95% CI 930.0-999.9), which is much closer, but still significantly lower than 1000. In addition, the Nexus 6P undercounted steps in the 65-84 year old group, with an estimated count of 908.6 steps (95% CI 860.7, 956.4). The Samsung Gear S2 undercounted steps in the 40-64 year old group, with an estimated count of 966.7 steps (95% CI 939.2, 994.3). However, the same device did not significantly undercount steps for the older age group, with an estimated count of 976.0 steps (95% CI 950.1, 1003.6).
The ubiquity of smartphones and other wearable devices, and their various physical activity tracking functionalities, have led to an increasing reliance on such devices as tools for participation in exercise programs. Such functionalities include step tracking, global positioning system functions (eg, distance, pace, elevation, map), heart-rate monitoring (either wrist-based, or with a chest strap), or calorie expenditure. Although some evidence suggests that step-counting is accurate for some wrist-worn devices and smartphone apps [
Our study indicates that height, weight, BMI, and dominant hand do not seem to impact the accuracy of step-counting devices. Conversely, our results suggest that the Fitbit Surge undercounted steps for all age groups, the Nexus 6P underestimated step counts for the 65-84 year old group, and the Samsung Gear S2 underestimated step counts for the 40-64 year old age group, but not the older age group (
A major strength of this study is that, to the best of our knowledge, it is the first that evaluates the impact of age, BMI, and dominant hand on the accuracy of the newer generation of wearable devices and smartphones with respect to step counting. Although BMI and dominant hand do not appear to impact the ability of devices to estimate step counts, age does affect estimates of step counts for some devices. Therefore, additional work needs to be done to evaluate the impact of wrist patterns and gait on the accuracy of step counting, and explore what other potential factors influence the results. Nonetheless, from a physical activity program adherence and weight loss perspective, one could argue that since less accurate devices tend to underestimate step counts, they should still be recommended for tracking steps, and could lead to additional exercise.
A potential weakness of the study is that we tested step counting in idealized conditions, indoor, on a treadmill. In real-world conditions, especially difficult terrain, we may see far more variation in step counts, given the changes in gait and wrist movements. Additionally, it is not uncommon to see different gaits between normal walking conditions versus walking on a treadmill.
Over the past 5 years, wearable devices, smartphones, and apps have become more ubiquitous, and have become widely recommended tools of behavioral change for weight loss by the general press, the health and fitness industry, and health care providers. In this study, we evaluated the accuracy of a selection of recently available wearable wrist-worn devices and smartphones with respect to step counting, as well as the impact of several variables of interest, most notably age. Our final reduced model after backward selection shows that BMI, height, weight, and dominant hand do not seem to impact the accuracy of step count. However, age does affect accuracy, and some devices tend to underestimate the number of steps walked by older users of wearable devices. This finding may be a minor issue for people trying to lose weight by adhering to a 10,000-step walking program, as they may walk more than planned. However, older and/or slower participants focusing on increasing physical activity may be negatively affected, and may struggle mentally if they fall short of 10,000 steps. What is not clear yet is whether current levels of physical fitness and activity impact the accuracy of such devices; this warrants further investigation.
body mass index
Centers for Disease Control
metabolic equivalent of task
operating system
standard deviation
The authors wish to thank Dr. Heather Vincent for access to exercise testing laboratory in the UF Health Sports Performance Center.
FM was responsible for the conception of the study, data collection, and editing of the manuscript. YG was responsible for the study design, data collection and analysis, and writing and editing of the manuscript. JB was responsible for the writing and editing of the manuscript. MJG was responsible for the analysis, writing, and editing of the manuscript. AP was responsible for the analysis, writing, and editing of the manuscript. MDS and AML were responsible for data collection. TWB was responsible for the conception of the study, and writing and editing of the manuscript.
None declared.