Background: Wrist-worn tracking devices such as the Apple Watch are becoming more integrated in health care. However, validation studies of these consumer devices remain scarce.
Objectives: This study aimed to assess if mobile health technology can be used for monitoring home-based exercise in future cardiac rehabilitation programs. The purpose was to determine the accuracy of the Apple Watch in measuring heart rate (HR) and estimating energy expenditure (EE) during a cardiopulmonary exercise test (CPET) in patients with cardiovascular disease.
Methods: Forty patients (mean age 61.9 [SD 15.2] yrs, 80% male) with cardiovascular disease (70% ischemic, 22.5% valvular, 7.5% other) completed a graded maximal CPET on a cycle ergometer while wearing an Apple Watch. A 12-lead electrocardiogram (ECG) was used to measure HR; indirect calorimetry was used for EE. HR was analyzed at three levels of intensity (seated rest, HR1; moderate intensity, HR2; maximal performance, HR3) for 30 seconds. The EE of the entire test was used. Bias or mean difference (MD), standard deviation of difference (SDD), limits of agreement (LoA), mean absolute error (MAE), mean absolute percentage error (MAPE), and intraclass correlation coefficients (ICCs) were calculated. Bland-Altman plots and scatterplots were constructed.
Results: SDD for HR1, HR2, and HR3 was 12.4, 16.2, and 12.0 bpm, respectively. Bias and LoA (lower, upper LoA) were 3.61 (–20.74, 27.96) for HR1, 0.91 (–30.82, 32.63) for HR2, and –1.82 (–25.27, 21.63) for HR3. MAE was 6.34 for HR1, 7.55 for HR2, and 6.90 for HR3. MAPE was 10.69% for HR1, 9.20% for HR2, and 6.33% for HR3. ICC was 0.729 (P<.001) for HR1, 0.828 (P<.001) for HR2, and 0.958 (P<.001) for HR3. Bland-Altman plots and scatterplots showed good correlation without systematic error when comparing Apple Watch with ECG measurements. SDD for EE was 17.5 kcal. Bias and LoA were 30.47 (–3.80, 64.74). MAE was 30.77; MAPE was 114.72%. ICC for EE was 0.797 (P<.001). The Bland-Altman plot and a scatterplot directly comparing Apple Watch and indirect calorimetry showed systematic bias with an overestimation of EE by the Apple Watch.
Conclusions: In patients with cardiovascular disease, the Apple Watch measures HR with clinically acceptable accuracy during exercise. If confirmed, it might be considered safe to incorporate the Apple Watch in HR-guided training programs in the setting of cardiac rehabilitation. At this moment, however, it is too early to recommend the Apple Watch for cardiac rehabilitation. Also, the Apple Watch systematically overestimates EE in this group of patients. Caution might therefore be warranted when using the Apple Watch for measuring EE.
Mobile health has been growing tremendously in the last decade. Future perspectives are promising for further growth and integration of mobile technology in health care. One type of technology that is particularly interesting for mobile health is the wrist-worn device capable of monitoring a large variety of parameters including heart rate (HR), energy expenditure (EE), steps taken, distance traveled, and in the near future possibly even oxygen saturation, blood glucose, and cardiac arrhythmia [- ]. Demand in patient population is also rising, with recent studies showing that up to one-third of patients with chronic heart disease use personal heart rate monitors and over two-thirds of patients who don’t already use a heart monitor reporting that they appreciate heart monitoring as being important for home-based exercise [ ].
Wrist-worn devices have the ability to monitor vital parameters and provide the user with an overview and feedback on the collected data. Validation studies comparing assessments by these devices to clinically approved measurements are often lacking. The Apple Watch uses photoplethysmography (PPG) with optical sensors at the wrist to measure HR. EE is calculated with algorithms that are not openly disclosed .
Validation studies have been done to evaluate the accuracy of HR, EE, and other measurements in healthy subjects for a variety of fitness trackers [- ]. Boudreaux et al [ ] tested eight devices for accuracy of HR and EE measurements on healthy subjects and found that HR accuracy from wearable devices differed at different exercise intensities with an increasing underestimation of HR at higher exercise intensities. It was also found that EE estimates were inaccurate. They conclude that wearable devices are not medical devices and users should be cautious when interpreting results of activity monitoring. Shcherbina et al [ ] tested seven devices on healthy subjects and found that HR measurements were within acceptable error range (5%). However, none of the tested devices had EE estimates within an acceptable range.
Modern health care is shifting its focus to home-centered health care with the aid of mobile technology. This study aimed to assess if commercially available mobile health technology such as the Apple Watch could be used for monitoring home-based exercise in future cardiac rehabilitation programs. The purpose of this study was to evaluate the accuracy of the Apple Watch with regard to HR and EE measurements during exercise in patients with cardiovascular diseases.
This study was conducted in accordance with the declaration of Helsinki and approved by the local institutional review board (registration number S58592). A written informed consent was obtained from every patient before inclusion in the study.
Patients were recruited at the cardiovascular rehabilitation consultation of the University Hospitals Leuven (Leuven, Belgium). All patients scheduled for a cardiopulmonary exercise test (CPET) as part of their cardiovascular rehabilitation program were consecutively included; one patient was excluded due to inability to use the VO2 mask due to recent laryngeal surgery. Patients were equipped with the Apple Watch during their CPET.
The participant number of 40 patients was determined based on the results of Wallen et al  considering a power of 0.5 and probability of type I error of 5%. This sample size is in line with comparable studies [ - , , ] of wrist-worn health-tracking devices where participant numbers ranged from 20 to 60 patients.
Device and Data Collection
The Apple Watch (Apple Inc) is a wrist-worn commercially available device that uses PPG for HR assessment. For this study, the Apple Watch Sport 42 mm (first generation) was used. The device was bought commercially and handled according to the manufacturer’s instructions.
The device was attached to the patient’s left wrist. Weight and height of the patient were recorded in the iPhone Health app before the test was started. On the Apple Watch Workout app, the option Indoor Cycling was chosen. On this app, the workout was started at the beginning of the resting phase of the CPET. Registrations were stopped at the same cutoff point as the stopping of the CPET because of patient exhaustion (cycling <60 rotations per minute).
Data were extracted using the iPhone Health app and the iPhone Health Export app. The Health app provided HR at 5 second intervals and EE at 2 to 3 second intervals. HR was converted to mean HR per 30 seconds; EE was analyzed as cumulative EE over the duration of the CPET test.
Other information collected included demographic data (gender, age, and anthropometrics: weight, height, body mass index [BMI]), peak oxygen uptake (peak VO2), VO2, and carbon dioxide (VCO2). The heart rate reserve (HRR) of each patient was calculated as the difference between the maximum and minimum HR as measured by electrocardiogram (ECG).
Patients performed a CPET test in normal conditions, having eaten and taken their routine medication, often including a beta-blocker. During this exercise test, participants wore the Apple Watch on their left wrist and wore a metabolic system (Jaeger Oxycon, Vyaire Medical Inc) for breath oxygen uptake and carbon dioxide output measurements and a 12-lead ECG (Cardiosoft, General Electric Company) for recording HR and heart rhythm. During the CPET, the ECG was constantly monitored by one of the researchers for cardiac arrhythmia. All tests were performed in a laboratory setting at a controlled room temperature of 21°C to 23°C.
The CPET started with 1 minute of seated rest. The exercise then started at 20 watts and load was increased with 20 W/min . This protocol was adjusted to a faster or slower increase in cycling resistance depending on physical fitness and based on previous CPET records.
Descriptive data are reported as mean and standard deviation or as median and range. Gas analysis data from indirect calorimetry (VO2 and VCO2) served as criterion measurement for calculations of EE (kilocalories per minute). For conversion of VO2 and VCO2 to caloric expenditure (kcal), the Weir equation  was used: kcal/min = ([1.1xRQ]+3.9)xVO2.
Twelve-lead ECG was used as criterion measurement for HR (beats per minute).
For analysis purposes, HR was analyzed for three 30 second intervals: one interval at the initial 30 second of the test (seated rest, HR1), one in the middle of the CPET time (moderate intensity based on test duration, HR2), and one interval prior to and including maximal performance level (HR3). EE was compared for each patient for the entire duration of the test.
Mean difference (MD) and standard deviation of the mean difference (SDD) were calculated. MDs were tested for normality using the Shapiro-Wilk test. Bland-Altman plots were constructed. Bias (MD) and limits of agreement (LoA, MD±1.96*SDD) were plotted on the Bland-Altman plots. Mean absolute error (MAE) and mean absolute percentage error (MAPE) were calculated for HR and EE. Intraclass correlation coefficient (ICC) estimates were calculated for each set of data based on an average measures, absolute agreement, 2-way mixed-effects model.
Visual examination of the Bland-Altman plots was used to rule out systematic error; bias and LoA were used to assess for clinical applicability. ICC was calculated to determine the correlation between Apple Watch measurements and gold standard measurements. Limits for ICC were used as suggested by Fokkema et al : an ICC >0.90 was considered excellent, 0.75 to 0.90 was good, 0.60 to 0.75 was moderate, and <0.60 was low.
For all statistical tests, the alpha level adopted for significance (2-tailed) was set at P<.05. All statistical analyses were performed using SPSS Statistics version 25 (IBM Corp).
Patient Characteristics and Exercise Capacity
A total of 40 patients (32 male, 8 female) were included in this study. All patients had established cardiovascular disease: ischemic heart disease (28/40), valvular heart disease (9/40), and other type of heart disease (3/40). Further patient characteristics are depicted in. All participants performed the exercise test until exhaustion. Numeric test results are summarized in .
|Age in years, mean (SD)||61.9 (15.2)|
|Male gender, n (%)||32 (80)|
|Weight (kg), mean (SD)||79.0 (16.2)|
|Height (cm), mean (SD)||171.1 (9.3)|
|Body mass index (kg/m2), mean (SD)||27.0 (5.0)|
|Cardiac disease type, n (%)|
|Ischemic heart disease||28 (70)|
|Valvular heart disease||9 (23)|
|Cardiovascular risk factors, n (%)|
|Family history of cardiovascular disease||20 (50)|
|Overweight (body mass index ≥25)||27 (68)|
|Obesity (body mass index ≥30)||9 (23)|
|Diabetes mellitus (total)||8 (20)|
|Diabetes mellitus (type 1)||1 (3)|
|Diabetes mellitus (type 2)||7 (18)|
|Smoking (total)||27 (68)|
|Current smoker||1 (3)|
|Atrial fibrillation||5 (13)|
|CPET time (sec), mean (SD)||512 (194)|
|VO2 peakb (L/min), mean (SD)||1.72 (0.89)|
|VO2 peak (mL/kg/min), mean (SD)||21.8 (11.6)|
|Heart rate reserve (bpm), mean (SD)||56 (29)|
aCPET: cardiopulmonary exercise test.
bVO2 peak: peak oxygen uptake.
SDD for HR1, HR2, and HR3 was 12.4, 16.2, and 12.0, respectively. Bias (ie, mean difference) and LoA were 3.61 (–20.74, 27.96) for HR1, 0.91 (–30.82, 32.63) for HR2, and –1.82 (–25.27, 21.63) for HR3. MAE was 6.34 for HR1, 7.55 for HR2, and 6.90 for HR3. MAPE was 10.69% for HR1, 9.20% for HR2, and 6.33% for HR3. The ICC was 0.729 (P<.001) for HR1, 0.828 (P<.001) for HR2, and 0.958 (P<.001) for HR3. Following the previously mentioned limits, this can be interpreted as a moderate correlation for HR1, a good correlation for HR2, and an excellent correlation for HR3. Bland-Altman plots and scatterplots comparing Apple Watch and ECG registration are depicted in.
The Bland-Altman plots are depicted in A, B, and C and compare mean values on the x-axis ([Apple Watch + gold standard]/2) with the difference of the values on the y-axis (Apple Watch – gold standard). Bias and limits of agreement are depicted as horizontal lines. The plots depicted in D, E, and F directly compare values measured by the Apple Watch (x-axis) versus ECG measurements (y-axis). All plots show a good correlation of measurements without a systematic error.
|Characteristics||HR1a (bpm)||HR2b (bpm||HR3c (bpm)||Energy expenditure (kcal)|
|Gold standard measurement, mean (SD)||69.9 (14.5)||94.6 (20.6)||126.5 (30.9)||40.6 (32.4)|
|Gold standard measurement, standard error||2.30||3.26||4.88||6.49|
|SDDd, mean (SD)||3.61 (12.4)||0.91 (16.2)||–1.82 (12.0)||30.47 (17.5)|
|ICCh (P value)||0.729 (<.001)||0.828 (<.001)||0.958 (<.001)||0.797 (<.001)|
aHR1: heart rate, seated rest.
bHR2: heart rate, moderate intensity.
cHR3: heart rate, maximal performance level.
dSDD: standard deviation of difference.
eLoA: limits of agreement.
fMAE: mean absolute error.
gMAPE: mean absolute percentage error.
hICC: intraclass correlation coefficient.
SDD for EE was 17.5. Bias and LoA were 30.47 (–3.80, 64.74). MAE was 30.77; MAPE was 114.72%. The ICC for EE was 0.797 (P<.001), which can be interpreted as a good correlation. Bland-Altman plot and a scatterplot directly comparing Apple Watch and indirect calorimetry are depicted in. A systematic error is seen with an overestimation of EE by the Apple Watch.
For HR, accuracy, as evaluated by the SDD, was best at peak exercise intensity and lowest at moderate exercise intensity. ICC was highest at peak exercise intensity and lowest for resting HR. On the other hand, bias was largest for resting HR and smallest at moderate intensity. Bland-Altman plots and scatterplots show a good correlation of measurements without a systematic error. MAPE is highest at seated rest and lowest at maximal intensity. MAPE range is between 6.33% and 10.69%.
When relating these numbers to clinical practice and thus to actual HR measurement, the numbers for bias can be considered low (ie, no systematic error is made when measuring HR with the Apple Watch). The SDDs are within an acceptable range to be clinically relevant. MAPE values are considered low compared to EE values and compared to earlier studies.
Our results thus show good accuracy of HR measurements by the Apple Watch when compared to the gold standard ECG measurements when tested in patients with known heart disease.
For EE, SDD was 17.5, and bias was 30.47. The ICC is 0.797, which is considered good correlation. MAPE is 114.72%, which is high when compared to the MAPE range of HR measurements. The SDD is within an acceptable range for clinical practice. The bias, however, is quite large, meaning a systematic error with an average of 30.47 kcal per CPET test is made when using the Apple Watch for measuring calories compared to indirect calorimetry.
This systematic error is also seen when analyzing the scatterplot directly comparing the Apple Watch with indirect calorimetry: measurements of indirect calorimetry correlate with higher values measured by the Apple Watch. On the Bland-Altman plot, values are situated around a positive bias of 30.47 with almost all values being in the positive range.
It can thus be concluded that during CPET the Apple Watch systematically measures a higher value for EE than indirect calorimetry when measured in patients with known heart disease.
Studies comparing wrist-worn devices and in particular the Apple Watch with gold standard methods have already shown a good accuracy of HR measurement and a generally poor accuracy of EE measurement [- , - ]. Similar ranges for MAPE for HR and EE were found in earlier studies [ , ]. Accuracy of EE measurement was found to vary depending on type of exercise and exercise intensity with a lower device error for running versus walking but a higher device error at higher levels of intensity for both running and walking [ ]. In other studies, it was already shown that in healthy subjects the Apple Watch overestimated EE during cycling and resistance exercise [ ].
Multiple studies aimed to validate commercially available devices for clinical practice, and Shcherbina et al state that there is an ongoing need to do so . To our knowledge, this is the first study that evaluates accuracy of HR and EE monitoring by a wrist-worn device such as the Apple Watch in patients with proven cardiovascular disease.
In our study, it was shown that in patients with cardiovascular disease, the Apple Watch measures HR during exercise with clinically acceptable accuracy: there was no systematic error and bias was small compared to ranges of HR recommended in rehabilitation programs. If further studies confirm these results, it might be considered safe to incorporate the Apple Watch in HR-guided training programs in the setting of cardiac rehabilitation. At this moment, however, data remains uncertain, and although the wearable can be used to track activities and motivate patients, it is too early to recommend the Apple Watch for clinical usage in a cardiac rehabilitation setting.
EE measurements were not accurate, with a tendency of the Apple Watch to systematically overestimate EE during CPET testing. Caution should therefore be taken when using the Apple Watch in rehabilitation programs in which caloric balance is important (eg, weight loss programs in the setting of cardiac rehabilitation).
This study has limitations. HR was assessed in patients with known cardiac disease; this group was, however, a heterogeneous group with the majority of patients having ischemic or valvular heart disease. No subgroup with known arrhythmia was included. We therefore cannot state that accuracy of HR monitoring is good in all types of patients with known heart disease. Further studies are needed in patient groups with different types of cardiovascular disease to fully assess validity of the Apple Watch in these subgroups.
This study was nonrandomized. Due to the high proportion of included patients who suffered from ischemic heart disease, there is a male predominance of study participants (80%). Subgroup analysis showed no significant difference between male and female groups for mean difference. However, this analysis is prone to error due to small patient size. Shcherbina et al showed that the error rate for measurement in males was significantly higher than the error rate in females . Further studies are needed to assess if there is indeed a difference in registration.
Further, exercise intensity was evaluated based on cycling resistance (test duration) only, by using a proportion of the maximally achieved resistance. Assessing ratings of perceived exertion would have added useful information.
EE was only assessed with data available through Apple general software. As mentioned in other studies , algorithms used to determine EE are not disclosed by the manufacturers. An independent study with transparent cooperation of manufacturers would be an interesting next step.
This study cannot distinguish between subgroups in which limitations inherent to PPG measurement are evident (eg, patients with darker skin tone, larger wrist circumference, higher BMI) . During the CPET, the wrist was kept still while cycling, so no error should be expected from arm movement.
To increase comparability between standard measurements and Apple Watch measurements, it was decided to stop measurement at the exact moment the patient stopped the exercise. No measurements were thus performed in the resting phase after the CPET.
Our results show that in patients with cardiovascular disease, the Apple Watch measures HR with clinically acceptable accuracy for 30 second averages of indoor cycling with the wrist kept stable. If confirmed, it might be considered safe to incorporate the Apple Watch in HR-guided training programs in the setting of cardiac rehabilitation. At this moment, however, it is too early to recommend the Apple Watch for cardiac rehabilitation. Also, the Apple Watch systematically overestimates EE in this group. Caution should therefore be taken when using the Apple Watch for measuring EE.
Conflicts of Interest
- Osborn CY, van Ginkel JR, Marrero DG, Rodbard D, Huddleston B, Dachis J. One Drop mobile on iphone and apple watch: an evaluation of HbA1c improvement associated with tracking self-care. JMIR Mhealth Uhealth 2017 Nov 29;5(11):e179 [FREE Full text] [CrossRef] [Medline]
- Appelboom G, Camacho E, Abraham M, Bruce S, Dumont E, Zacharia B, et al. Smart wearable body sensors for patient self-assessment and monitoring. Arch Public Health 2014;72(1):1-9 [FREE Full text] [CrossRef] [Medline]
- Buys R, Claes J, Walsh D, Cornelis N, Moran K, Budts W, et al. Cardiac patients show high interest in technology enabled cardiovascular rehabilitation. BMC Med Inform Decis Mak 2016 Dec 19;16:1-9 [FREE Full text] [CrossRef] [Medline]
- Wallen MP, Gomersall SR, Keating SE, Wisløff U, Coombes JS. Accuracy of heart rate watches: implications for weight management. PLoS One 2016;11(5):e0154420 [FREE Full text] [CrossRef] [Medline]
- Bai Y, Hibbing P, Mantis C, Welk GJ. Comparative evaluation of heart rate-based monitors: Apple Watch vs Fitbit Charge HR. J Sports Sci 2018;36(15):1734-1741. [CrossRef] [Medline]
- Boudreaux B, Hebert E, Hollander D, Williams B, Cormier C, Naquin M, et al. Validity of wearable activity monitors during cycling and resistance exercise. Med Sci Sports Exerc 2018 Dec;50(3):624-633. [CrossRef] [Medline]
- Chowdhury E, Western M, Nightingale T, Peacock O, Thompson D. Assessment of laboratory and daily energy expenditure estimates from consumer multi-sensor physical activity monitors. PLoS One 2017 Feb;12(2):e0171720 [FREE Full text] [CrossRef] [Medline]
- Claes J, Buys R, Avila A, Finlay D, Kennedy A, Guldenring D, et al. Validity of heart rate measurements by the Garmin Forerunner 225 at different walking intensities. J Med Eng Technol 2017 Aug;41(6):480-485. [CrossRef] [Medline]
- Dooley EE, Golaszewski NM, Bartholomew JB. Estimating accuracy at exercise intensities: a comparative study of self-monitoring heart rate and physical activity wearable devices. JMIR Mhealth Uhealth 2017 Mar 16;5(3):e34 [FREE Full text] [CrossRef] [Medline]
- Kooiman TJM, Dontje ML, Sprenger SR, Krijnen WP, van der Schans CP, de Groot M. Reliability and validity of ten consumer activity trackers. BMC Sports Sci Med Rehabil 2015;49(4):793-800 [FREE Full text] [CrossRef] [Medline]
- Gillinov S, Etiwy M, Wang R, Blackburn G, Phelan D, Gillinov AM, et al. Variable accuracy of wearable heart rate monitors during aerobic exercise. Med Sci Sports Exerc 2017 Aug;49(8):1697-1703. [CrossRef] [Medline]
- Shcherbina A, Mattsson CM, Waggott D, Salisbury H, Christle JW, Hastie T, et al. Accuracy in wrist-worn, sensor-based measurements of heart rate and energy expenditure in a diverse cohort. J Pers Med 2017 May 24;7(2):1-12 [FREE Full text] [CrossRef] [Medline]
- Wang R, Blackburn G, Desai M, Phelan D, Gillinov L, Houghtaling P, et al. Accuracy of wrist-worn heart rate monitors. JAMA Cardiol 2016 Oct 12;2(1):104-106. [CrossRef] [Medline]
- Delgado-Gonzalo R, Parak J, Tarniceriu A, Renevey P, Bertschi M, Korhonen I. Evaluation of accuracy and reliability of PulseOn optical heart rate monitoring device. Conf Proc IEEE Eng Med Biol Soc 2015 Aug:430-433. [CrossRef] [Medline]
- Parak J, Korhonen I. Evaluation of wearable consumer heart rate monitors based on photopletysmography. Conf Proc IEEE Eng Med Biol Soc 2014;2014:3670-3673. [CrossRef] [Medline]
- Jo E, Lewis K, Directo D, Kim MJ, Dolezal BA. Validation of biofeedback wearables for photoplethysmographic heart rate tracking. J Sports Sci Med 2016 Sep;15(3):540-547 [FREE Full text] [Medline]
- Buys R, Coeckelberghs E, Vanhees L, Cornelissen V. The oxygen uptake efficiency slope in 1411 caucasian healthy men and women aged 20-60 years: reference values. Eur J Prev Cardiol 2015 Mar;22(3):356-363. [CrossRef] [Medline]
- McArdle W, Katch F, Katch V. Measurement of human energy expenditure. In: McArdle W, Katch F, Katch V, editors. Nutrition, Energy, and Human Performance. 7th edition. Philadelphia: Lippincott Williams & Wilkins; 2010:178-192.
|bpm: beats per minute|
|BMI: body mass index|
|CPET: cardiopulmonary exercise test|
|EE: energy expenditure|
|HR: heart rate|
|HRR: heart rate reserve|
|ICC: intraclass correlation coefficient|
|LoA: limits of agreement|
|MAE: mean absolute error|
|MAPE: mean absolute percentage error|
|MD: mean difference|
|SDD: standard deviation of difference|
|VCO2: carbon dioxide|
|VO2: oxygen uptake|
Edited by G Eysenbach; submitted 09.08.18; peer-reviewed by A Shcherbina, K Goessler, B Boudreaux, J Goris; comments to author 13.09.18; revised version received 05.11.18; accepted 09.12.18; published 19.03.19Copyright
©Maarten Falter, Werner Budts, Kaatje Goetschalckx, Véronique Cornelissen, Roselien Buys. Originally published in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 19.03.2019.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR mhealth and uhealth, is properly cited. The complete bibliographic information, a link to the original publication on http://mhealth.jmir.org/, as well as this copyright and license information must be included.