This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR mhealth and uhealth, is properly cited. The complete bibliographic information, a link to the original publication on http://mhealth.jmir.org/, as well as this copyright and license information must be included.
Wrist-worn activity trackers are popular, and an increasing number of these devices are equipped with heart rate (HR) measurement capabilities. However, the validity of HR data obtained from such trackers has not been thoroughly assessed outside the laboratory setting.
This study aimed to investigate the validity of HR measures of a high-cost consumer-based tracker (Polar A370) and a low-cost tracker (Tempo HR) in the laboratory and free-living settings.
Participants underwent a laboratory-based cycling protocol while wearing the two trackers and the chest-strapped Polar H10, which acted as criterion. Participants also wore the devices throughout the waking hours of the following day during which they were required to conduct at least one 10-min bout of moderate-to-vigorous physical activity (MVPA) to ensure variability in the HR signal. We extracted 10-second values from all devices and time-matched HR data from the trackers with those from the Polar H10. We calculated intraclass correlation coefficients (ICCs), mean absolute errors, and mean absolute percentage errors (MAPEs) between the criterion and the trackers. We constructed decile plots that compared HR data from Tempo HR and Polar A370 with criterion measures across intensity deciles. We investigated how many HR data points within the MVPA zone (≥64% of maximum HR) were detected by the trackers.
Of the 57 people screened, 55 joined the study (mean age 30.5 [SD 9.8] years). Tempo HR showed moderate agreement and large errors (laboratory: ICC 0.51 and MAPE 13.00%; free-living: ICC 0.71 and MAPE 10.20%). Polar A370 showed moderate-to-strong agreement and small errors (laboratory: ICC 0.73 and MAPE 6.40%; free-living: ICC 0.83 and MAPE 7.10%). Decile plots indicated increasing differences between Tempo HR and the criterion as HRs increased. Such trend was less pronounced when considering the Polar A370 HR data. Tempo HR identified 62.13% (1872/3013) and 54.27% (5717/10,535) of all MVPA time points in the laboratory phase and free-living phase, respectively. Polar A370 detected 81.09% (2273/2803) and 83.55% (9323/11,158) of all MVPA time points in the laboratory phase and free-living phase, respectively.
HR data from the examined wrist-worn trackers were reasonably accurate in both the settings, with the Polar A370 showing stronger agreement with the Polar H10 and smaller errors. Inaccuracies increased with increasing HRs; this was pronounced for Tempo HR.
The scientific evidence on the health and well-being benefits of physical activity (PA) is overwhelming, and, as such, increasing activity levels is a core public health target [
The PA research landscape could broadly be divided into 2 core facets—PA surveillance and PA promotion [
To this end, the soaring availability and use of commercial wrist-worn activity tracking devices are increasingly being harnessed by PA researchers who are keen to use them for large-scale surveillance and intervention studies [
In addition to measurements of accelerometer-based metrics, many newer wrist-worn trackers are equipped with capabilities to collect data on physiological measures such as heart rate (HR) [
Validating HR data from wrist-worn trackers in healthy individuals is a recent endeavor [
What the above-mentioned studies have in common is that they were conducted in a controlled laboratory setting. However, there are differences between a controlled and less controlled environment, and collecting data in both environments is warranted to disentangle such differences and increase ecological relevance of findings. To our knowledge, there are only 2 free-living studies. One research team merely collected HR data during common daily activities over a few hours [
This study aimed to examine the validity of HR data from 2 wrist-worn HR trackers, the Tempo HR, a low-cost device used for a national PA promotion campaign in Singapore, and the Polar A370, a consumer-based fitness and activity tracking device, in laboratory and free-living settings. Both these trackers have not been assessed previously.
We conducted a 2-phased validation study with all participants: laboratory phase and free-living phase. The study procedures were approved by the institutional review board of the National University of Singapore (NUS IRB: S-18-026), and written-informed consent was obtained from all participants before study enrolment. Data collection took place between March and May 2018.
We applied multiple recruitment strategies to ensure a sample with varied characteristics. Students and staff were recruited through a post on the university’s Web-based learning system blackboard and word-of-mouth. Participants from the general public were recruited through emails sent to participants of the National Steps Challenge (NSC), a national PA promotion campaign rolled out by the Health Promotion Board (HPB), Singapore, yearly for 6 months (October to April).
Interested people were assessed for eligibility during an initial screening call and during the laboratory visit. The following inclusion criteria were applied: reasonably physically active English-literate men and women aged between 21 and 50 years with a body mass index (BMI) of at least 18.5 kg/m2; absence of physical disabilities or illness that would restrict moderate PA as assessed with the Physical Activity Readiness Questionnaire [
During the first visit, we collected sociodemographic information and measured height and weight with a SECA stadiometer (SECA GmbH). Following this, participants were fitted with 3 HR monitoring devices. We used the chest-strapped Polar H10 HR monitor (Polar Electro Oy) as our criterion device. Concurrent validity of similar Polar devices against echocardiogram (ECG) is well established [
Participants were requested to go through an incremental cycling protocol of 20 min on a stationary exercise bicycle (Monark 894E). The protocol consisted of four 5-min stages, and participants were required to cycle at an intensity corresponding to their designated HR zones for each stage (45%, 55%, 65%, and 75% of maximum HR [HRmax]; ±10 beats per minute [bpm]) [
After completing the cycling protocol, participants were introduced to the procedures of the free-living phase. In addition to the devices used in the laboratory phase, we provided participants with an ActiGraph wGT3X+BT accelerometer (ActiGraph) to collect HR data from the Polar H10 chest strap via Bluetooth. The small tamper-proof device was attached with a belt to the right side of the hip. We also provided an instruction sheet detailing adequate wear.
Participants were instructed to wear the devices during waking hours of the following day (after getting up in the morning until bedtime at night) and only remove them during water-based activities. In addition, we requested that participants engage in at least one 10-min bout of MVPA during the day to capture a wide range of HR signals. Finally, participants were provided with a device-wear log to record their wear and nonwear as well as their MVPA session(s). Participants returned to the laboratory a few days later to return the study devices and transfer HR data of the Tempo HR to the Healthy365 app.
The sampling frequencies of the Tempo HR, Polar A370, and Polar H10 chest strap were 0.1 Hz, 1 Hz, and 1 Hz, respectively. As such, HR data were collected every second by the Polar devices and every 10 seconds by the Tempo HR (a sample of the raw data is provided in
We summarized participants’ characteristics descriptively using mean and SD for continuous variables and number and percentage for categorical variables.
We calculated the intraclass correlation coefficients (ICCs) using mixed effects models to assess the absolute agreement between the criterion (Polar H10) and the other trackers (Tempo HR and Polar A370) in the laboratory phase and free-living phase. The strength of the ICC was interpreted as weak (<0.50), moderate (≥0.50 to 0.74), strong (≥0.75 to 0.89), and very strong (≥0.90) [
We then calculated mean absolute errors (MAEs) and mean absolute percentage errors (MAPE; absolute error/criterion×100) between the criterion (Polar H10) and, both, the Tempo HR and the Polar A370 trackers, to gauge overall measurement error. As highlighted in a recent study, there is no clear cutoff for what level of error would indicate adequate validity between measures [
Moreover, we ranked the 10-second HR time points derived from the Polar H10 and divided them into deciles. As such, decile 1 contained the lowest 10% of all HR and decile 10 contained the highest 10% of all HR. We then time matched these HR deciles with HR data from the Tempo HR and Polar A370. We constructed the box plots to compare the HR data from the Tempo HR and the Polar A370 with the Polar H10 measures across the deciles.
Finally, we constructed 2×2 tables to estimate the sensitivity and specificity of the 2 trackers for identifying the different HR zones based on the Polar H10 (<64% HRmax and ≥64% HRmax). The cutoff of 64% HRmax was chosen because it is the updated cutoff [
Of the 57 people screened, 55 were eligible and joined the study (mean age 30.5 [SD 9.8] years), with 26 being female (47%), 36 with normal weight (65%; BMI <23 kg/m2), and 39 with Chinese ethnicity (71%). Due to the unavailability of some HR data, few participants were excluded from some analyses.
Data analysis flow showing participants in analysis (n) and number of matched heart rate time points. BMI: body mass index; HR: heart rate.
In the laboratory phase, the HR data from the Tempo HR showed a moderate ICC (0.51; 95% CI 0.38 to 0.60) with the data from Polar H10. With a MAE of 15.1 bpm (95% CI 14.6 to 15.5 bpm) and an MAPE of 13.0%, the measurement error was somewhat large. Polar A370 data also had a moderate but stronger ICC with the Polar H10 (0.73; 95% CI 0.66 to 0.78). Measurement errors were small with a MAE of 7.3 bpm (95% CI 7.0 to 7.7 bpm) and an MAPE of 6.4%. On average, both the devices underestimated HR: Tempo HR by 9.7 bpm (95% CI −10.2 to −9.2 bpm) and Polar A370 by 5.7 bpm (95% CI −6.1 to −5.3 bpm).
Laboratory phase: Bland-Altman plot between the heart rate data from the Polar H10 and the Tempo HR. Light blue dotted lines show the limits of agreement, and the dark blue dotted line shows the mean of the difference. HR: heart rate.
Laboratory phase: box plot providing by-decile comparisons of mean Polar H10 (white) and Tempo HR HR data (gray). HR: heart rate; bpm: beats per minute.
As can be seen in
Laboratory phase: Bland-Altman plot between the heart rate data from the Polar H10 and the Polar A370. Light blue dotted lines show the limits of agreement, and the dark blue dotted line shows the mean of the difference. HR: heart rate.
Laboratory phase: box plot providing by-decile comparisons of mean Polar H10 (white) and Polar A370 HR data (gray). HR: heart rate; bpm: beats per minute.
The ICC between the Polar H10 and the Tempo HR data was moderate in the free-living phase (0.71; 95% CI 0.70 to 0.71). Errors were smaller compared with the laboratory phase with a MAE of 8.7 bpm (95% CI 8.7 to 8.8 bpm) and an MAPE of 10.2%. For the Polar A370, the ICC between the Polar H10 and the Polar A370 tracker data was strong (0.83; 95% CI 0.79 to 0.87). Errors were similar compared with the ones in the laboratory phase with a MAE of 5.9 bpm (95% CI 5.8 to 5.9 bpm) and an MAPE of 7.1%. In contrast to the results from the laboratory phase, both the devices overestimated HR slightly (Tempo HR 0.4 bpm; 95% CI 0.3 to 0.5 bpm and Polar A370 3.4 bpm; 95% CI 3.3 to 3.4 bpm).
The BA plot in
Free-living phase: Bland-Altman plot between the heart rate data from the Polar H10 and the Tempo HR. Light blue dotted lines show the limits of agreement, and the dark blue dotted line shows the mean of the difference. HR: heart rate.
Free-living phase: box plot providing by-decile comparisons of mean Polar H10 (white) and Tempo HR HR data (gray). HR: heart rate; bpm: beats per minute.
Free-living phase: Bland-Altman plot between the heart rate data from the Polar H10 and the Polar A370. Light blue dotted lines show the limits of agreement, and the dark blue dotted line shows the mean of the difference. HR: heart rate.
Free-living phase: box plot providing by-decile comparisons of mean Polar H10 (white) and Polar A370 HR data (gray). HR; heart rate; bpm: beats per minute.
When analyzing how many MVPA time points were identified by the Tempo HR and the Polar A370, we set the MVPA cutoff at 64% HRmax. In the laboratory phase, of the total aggregate time points in the MVPA HR zone that were detected by the Polar H10, 62.13% (1872/3013) were also identified by the Tempo HR, whereas the Polar A370 identified 81.09% (2273/2803). The remaining time was spent below the MVPA HR zone, of which 91.52% (4267/4662) and 97.52% (4637/4755) were also registered by the Tempo HR and the Polar A370, respectively. Overall, the Tempo HR identified 79.99% (6139/7675) and the Polar A370 91.42% (6910/7558) of data points accurately.
In the free-living phase, we found that the Tempo HR identified 54.27% (5717/10,535) and the Polar A370 identified 83.55% (9323/11,158) of the MVPA time points that the Polar H10 registered. The Tempo HR picked up 97.22% (186,402/191,741) and the Polar A370 picked up 96.72% (183,625/189,861) of time points below the MVPA HR zone. Overall accuracy was above 90% for both the trackers (Tempo HR: 94.98%, 192,119/202,276; Polar A370: 95.98%, 192,948/201,019). An overview of the results is provided in
Number of 10-second matched time points spent in heart rate zones as detected by the Polar A370 and the Tempo HR in the laboratory phase and free-living phase.
According to Polar H10 | ≥64% HRmaxa, n (%) | <64% HRmax, n (%) | ||
|
||||
|
|
|||
|
|
≥64% HRmax | 2273 (81.09) | 118 (2.48) |
|
|
<64% HRmax | 530 (18.91) | 4637 (97.52) |
|
Total | 2803 (37.09) | 4755 (62.91) | |
|
|
|||
|
|
≥64% HRmax | 1872 (62.13) | 395 (8.47) |
|
|
<64% HRmax | 1141 (37.87) | 4267 (91.53) |
|
Total | 3013 (39.26) | 4662 (60.74) | |
|
||||
|
|
|||
|
|
≥64% HRmax | 9323 (83.55) | 6236 (3.28) |
|
|
<64% HRmax | 1835 (16.45) | 183,625 (96.72) |
|
Total | 11,158 (5.55) | 189,861 (94.45) | |
|
|
|||
|
|
≥64% HRmax | 5717 (54.27) | 5339 (2.78) |
|
|
<64% HRmax | 4818 (45.73) | 186,402 (97.22) |
|
Total | 10,535 (5.21) | 191,741 (94.79) |
aHRmax: maximum heart rate.
From the present 2-phased tracker validation study involving 55 participants with varying characteristics, a few key findings can be highlighted. First, HR data from the low-cost Tempo HR tracker showed moderate agreement with the data from the chest-strapped Polar H10 in both the laboratory phase and free-living phase. Although the measurement errors of the Tempo HR were above the 10% validity cutoff [
To establish the stability of the study results, we conducted sensitivity analyses. For this, we removed outliers and compared Polar H10 with the 2 other trackers using the remaining matched data points available. Outliers were defined as follows: a Pearson correlation coefficient of less than 0.3 between the Polar H10 and the test trackers in the laboratory setting. In secondary analyses, we only used data that were available from all 3 devices. Conducting these analyses did not change the results markedly (data not shown). As such, the reported results are not influenced by extreme cases or outliers.
When contextualizing our laboratory findings with those reported in the literature, the Polar A370 and the Tempo HR appear to have comparable or better accuracy with the market leader Fitbit, which has been studied extensively [
Comparing our results from the free-living phase with the results reported in other studies is problematic as, to our knowledge, there are only 2 studies that had a free-living element [
The finding that the accuracy of wrist-worn trackers decreases as intensity increases has been observed in previous laboratory studies. For example, Boudreaux et al found that an increase in cycling intensity was associated with increasing HR underestimation in assessed activity trackers [
It is difficult to draw firm conclusions about such trends in the free-living phase, as there are no comparable studies available. We observed smaller differences across activity intensities, which might be related to the fact that the proportion of higher HR values was rather small compared with the laboratory study. This might also partially explain the generally higher accuracy in the free-living phase versus the laboratory phase. Another reason for the difference in accuracy between the free-living phase and laboratory phase might be related to the temperature difference between the laboratory and the free-living settings [
From the results of our study and the overall HR tracker validation literature, it is obvious that there are marked differences between devices in terms of accuracy that ought to be explained. A review by Tamura et al provides some insights into the factors that impact HR measurement through PPG in different devices [
A number of strengths of this study can be highlighted. To the best of our knowledge, this is the first study that thoroughly investigated the validity of HR measures of modern wrist-worn activity trackers in 2 settings, the laboratory and daily life. Research on the real-world performance of activity trackers can advance the PA and exercise measurement field substantially as these trackers are meant to be used as people go about their normal lives. Second, our study sample size was relatively large and diverse, which is rare in validation studies. Third, we were able to collect temporally dense HR data from all devices (approximately 12 hours per device in the free-living phase), which allowed us to conduct in-depth analyses of tracker validity across varying HRs. The richness of data we collected stands in stark contrast to most previous studies that relied mainly on few data points, for example, at the end or midpoint of a stage in a cycling protocol [
A recent review highlighted the strong increase in the availability and use of wrist-worn activity trackers and identified 432 different activity trackers that belonged to 123 unique brands [
Example of raw data retrieved from the Polar H10, Polar A370, and Tempo HR.
Bland-Altman
body mass index
beats per minute
echocardiogram
Health Promotion Board
heart rate
maximum heart rate
intraclass correlation coefficient
limits of agreement
mean absolute error
mean absolute percentage error
moderate-to-vigorous physical activity
National Steps Challenge
physical activity
photoplethysmography
This research is supported by the Singapore Ministry of Health’s National Medical Research Council under the Fellowship Programme by Singapore Population Health Improvement Centre (NMRC/CG/C026/2017_NUHS). The authors would like to acknowledge all participants who took part in the study. Finally, AMM acknowledges his new-born daughter, Lia Yihan, who was so kind to only cry after a few reviewer comments were addressed.
AMM, NXW, and FMR conceived the study. Data collection was conducted by AMM and NXW. ICCL provided expertise for the laboratory study and supported the setup of the study. JY and CST analyzed the data with iterative feedback from AMM, NXW, and FMR. NL, JT, and AT supported data extraction and provided critical feedback throughout. AMM wrote the manuscript and received feedback from all coauthors. All authors read and approved the final version of the manuscript.
None declared.