Accuracy of Optical Heart Rate Sensing Technology in Wearable Fitness Trackers for Young and Older Adults: Validation and Comparison Study

Background: Wearable fitness trackers are devices that can record and enhance physical activity among users. Recently, photoplethysmography (PPG) devices that use optical heart rate sensors to detect heart rate in real time have become popular and help in monitoring and controlling exercise intensity. Although the benefits of using optical heart rate monitors have been highlighted through studies, the accuracy of the readouts these commercial devices generate has not been widely assessed for different age groups, especially for the East Asian population with Fitzpatrick skin type III or IV. Objective: This study aimed to examine the accuracy of 2 wearable fitness trackers with PPG to monitor heart rate in real time during moderate exercise in young and older adults. Methods: A total of 20 young adults and 20 older adults were recruited for this study. All participants were asked to undergo a series of sedentary and moderate physical activities using indoor aerobic exercise equipment. In this study, the Polar H7 chest-strapped heart rate monitor was used as the criterion measure in 2 fitness trackers, namely Xiaomi Mi Band 2 and Garmin Vivosmart HR+. The real-time, second-by-second heart rate data obtained from both devices were recorded using the broadcast heart rate mode. To critically analyze the results, multiple statistical parameters including the mean absolute percentage error (MAPE), Lin concordance correlation coefficient (CCC), intraclass correlation coefficient, the Pearson product moment correlation coefficient, and the Bland-Altman coefficient were determined to examine the performances of the devices. Results: Both test devices exhibited acceptable overall accuracy as heart rate sensors based on several statistical tests. Notably, the MAPE values were below 10% (the designated threshold) in both devices (GarminYoung=3.77%; GarminSenior=4.73%; XiaomiYoung=7.69%; and XiaomiSenior=6.04%). The scores for reliability test of CCC for Garmin were 0.92 (Young) and 0.80 (Senior), whereas those for Xiaomi were 0.76 (Young) and 0.73 (Senior). However, the results obtained using the Bland-Altman analysis indicated that both test optical devices underestimated the average heart rate. More importantly, the study documented some unexpected outlier readings reported by these devices when used on certain participants. Conclusions: The study reveals that commonly used optical heart rate sensors, such as the ones used herein, generally produce accurate heart rate readings irrespective of the age of the user. However, users should avoid relying entirely on these readings to indicate exercise intensities, as these devices have a tendency to produce erroneous, extreme readings, which might misinterpret the real-time exercise intensity. Future studies should therefore emphasize the occurrence rate of such errors, as this will likely benefit the development of improved models of heart rate sensors. (JMIR Mhealth Uhealth 2020;8(4):e14707) doi: 10.2196/14707


Growing Popularity and Functions of Wearable Fitness Trackers
Wearable fitness trackers have gained popularity worldwide, and their annual sales continue to grow [1,2]. These trackers were listed as the No. 1 fitness trends in the years 2016, 2017, 2019, and 2020 in a worldwide survey conducted by the American College of Sports and Medicine [3][4][5][6]. The advantages of these wearable devices are that they are convenient to use and measure various parameters noninvasively. In addition, they allow the users to monitor their daily physical activities in a free-living environment instead of controlled laboratory settings.
Earlier versions of fitness trackers, equipped with triaxial accelerometers and a gyroscope, could sense motions made by the users, monitor their activity metrics, and provide estimated information such as walking and running in terms of steps or distance, energy expenditure, sedentary time, sleep patterns, and activity routes (with GPS function). Most of these fitness trackers were placed on the wrist. The users obtained the real-time information from the display on the trackers or received feedback through connected mobile phone apps.
The recent application of photoplethysmography (PPG) in wrist-based wearable fitness trackers has enabled newer versions of fitness trackers to detect heart rates. This breakthrough provides several benefits. First, heart rate is a vital component in cardiovascular fitness assessments and an important parameter in exercise training programs [7]. Second, resting heart rate is also a widely used parameter for general health assessments to detect cardiovascular diseases [8]. Thus, the development of fitness trackers that have heart rate detection technologies has brought about several additional benefits that were absent in older models.
PPG measures heart rates based on the changes in vascular blood flow during the cardiac cycle [9]. It has previously been applied in medical devices such as oximeters [10]. This technology has since been integrated and commercialized as optical heart rate monitors by companies such as Mio and Omron. The number of commercial companies producing such devices has gradually grown in the last 5 years (ie, Apple Watch, Fitbit, and Garmin), along with the design and development of such products and research [1,2,[10][11][12][13][14][15][16].

Validation of Fitness Trackers
Despite the growing popularity and functions of these fitness trackers and substantial investments in commercial advertisements, many users have expressed concerns regarding the data accuracy of these trackers [17]. Inaccurate and inconsistent readings are major reasons for negative user experiences, which discourage the continued use of these devices [17][18][19][20]. The concerns regarding the data accuracy of these trackers influence the users in terms of their perceptions of personal health and program interventions or research evaluations that adopt these devices.
Most commercially available fitness trackers use step counts as a parameter to indicate the level of physical activity. The step-count function of these devices has been widely scrutinized in studies examining their accuracy [21][22][23]. Importantly, while generally producing accurate results, these devices did not report reliable step-count readings in certain conditions, such as slow walking or while performing unnatural hand movements [21][22][23][24]. A systematic review investigated the validity and reliability of Fitbit and Jawbone trackers. The results revealed that most studies validated the tracker accuracy and indicated that it had a higher accuracy for step counts, followed by that for distance and physical activity and finally for energy consumption and sleep [23]. Nevertheless, most studies recommend caution when deriving energy expenditure estimations directly using these readings [11,13,25,26]. In addition, studies have started to examine the validity and reliability of the fitness trackers among older adults instead of young adults because they might present different movements such as gait patterns or speeds [27,28].

Accuracy of Optical Heart Rate Monitoring
The accuracy of heart rate displayed on the fitness trackers with optical heart rate monitors has also been investigated [11,14,16,[29][30][31]. Common research methods for the development of these optical heart rate monitors involve fitness assessments using basic indoor training equipment such as treadmills, stationary cycles, and sometimes elliptical machines. This type of study allows researchers to evaluate the feasibility of implementing optical heart rate monitors in aerobic training for the general population [1,2,[10][11][12][13][14][15][16].
Previous studies have reported that, generally, optical sensing fitness trackers have acceptable accuracy. However, the accuracy might vary across brands [16,31] in terms of activity patterns or speed, exercise intensities [10,14,31], skin tone [10], room temperature [32], placement of sensors [29], or compression-induced and motion-induced artifacts [13,[32][33][34]. For example, in a study conducted by Boudreaux et al [13], participants wore 8 different fitness trackers, and an increase in exercise intensity reduced the accuracy of heart rate measurement. In another validation study, the measured heart rate showed a minor deviation compared with the actual heart rate in participants with a dark skin tone [30].
Although the adoption of heart rate fitness trackers with optical heart rate sensors in the medical field is still debatable [12,35], there have been several lawsuits regarding the accuracy of heart rate information [36,37]. Assessing the reliability and validity of the heart rate readings provided by these trackers is essential because they are vital in clinical settings, and these trackers have been increasingly accepted by consumers as a tool for self-monitoring or in many intervention programs for health management [11,14].

Research Gaps
Owing to the limitations on raw data acquisition in commercial fitness trackers, previous studies have only used average heart rate data [14] or manually recorded the heart rate at certain intervals [11]. However, averaging the heart rate or recording it at a certain time point is problematic because both fail to represent any change or variability [38]. Studies that have compared continuous heart rate in more detail revealed that evaluating the accuracy of these test devices at a second-by-second level is difficult [2]. One study used video recording to manually determine the second-by-second heart rate, which was a labor-intensive and time-consuming method [12]. Moreover, potential variables such as age, ethnicity, and gender were not considered in earlier studies [2,14]. For example, a majority of the participants of several studies that have been conducted in the US-European regions were white (Fitzpatrick skin type I or II) [2,12,16]. PPG technology uses an optical sensor that illuminates light and measures the change in light absorption by the skin, which varies with change in blood volume; thus, the accuracy of heart rate monitoring using PPG is subject to skin structures [39]. Typically, the skin changes with age, that is, "fine wrinkles, roughness, mottled hyperpigmentation, dilated blood vessels, and loss of skin tone" are observed [40]. In addition, age-related changes such as arterial stiffness can influence the pulse shape in PPG [32]. Therefore, appropriate validation of these devices for different age groups among non-white participants is imperative.

Aim of the Study
This study evaluated the heart rate reading performances of 2 commercially available fitness trackers in various settings using a second-by-second data acquisition approach. Moreover, to determine whether age would generate discrepancies in the readouts, young and senior participants were characterized separately. This study was conducted in Taiwan to validate 2 trackers used by the yellow skin tone population (Fitzpatrick skin type III or IV) [41,42].

Participants
To determine a credible sample size for achieving statistical power in the intraclass coefficient correlation (ICC) test, this study used R package (ICC.Sample. Size, GPL-3; 2015, R core team, R Foundation for Statistical Computing). Based on the formula proposed by Zou [43], the number of participants (n) required for achieving a target power of 0.90 was 8. Therefore, this study involved 20 adults aged 65 years and above (Senior) and 20 adults aged between 20 years and 26 years (Young). All participants had no clinical history of cardiovascular diseases, neurological disorders, lower limb injuries, or any other factors that would render them unfit to perform the exercise. To ensure consistency, individuals with tattoos or birthmarks on the position where the device was to be worn were not included in the study. To minimize possible sex-driven discrepancies, the sex ratio in both the Senior and Young groups was kept identical (20:20).

Research Device
This study used the Polar heart rate strap (H7, Polar Electro Oy), widely used as the criterion for measuring heart rate in sports science studies [2,44]. The optical fitness trackers selected for this study were Xiaomi Mi Band 2 (Xiaomi Cooperation) and Garmin Vivosmart HR+ (Garmin International Inc) because these 2 fitness trackers share a significant market share in the Asia Pacific region, which is expected to grow. Mi Band 2 was equipped with a PPG module (with 2 LED lights) and an accelerometer to detect heart rate and sense motion. Vivosmart HR+ was also equipped with a PPG module (with 3 LED lights) and an accelerometer. In addition, GPS chips are embedded in the Vivosmart HR+ for measuring the travel distance during outdoor exercises.
Both the devices provided information regarding step counts, energy expenditure, notification for breaking up the prolonged sedentary time, and smart notifications, and both claimed accurate heart rate detection. In addition, the 2 devices had the broadcast heart rate mode, a feature that enables the transmission of second-by-second heart rate data through Bluetooth or ANT+ to the paired receiving device, and served a similar function of the conventional heart rate strap. Moreover, wrist-based fitness trackers were easy to wear and remove and, thus, eased the discomfort of wearing chest straps for monitoring the real-time heart rate during traditional exercise and fitness training programs or interventions [10,45]. Specifically, PPG fitness trackers provide pulse rate data that are obtained with an increase or decrease in blood pressure in the arteries because of the contraction and relaxation of the heart, thus leading to a noticeable pulse. Although the signals of pulse waveforms are different from those of heartbeat waveforms, the pulse rate can be analyzed to represent the heart rate [32]. The term heart rate has been used in this study in line with many studies on heart rate fitness trackers [2,[10][11][12]15,29,30,38,46]. Hence, in this study, the heart rate will be used in its broadest sense to refer to the readings from the optical fitness trackers.
The second-by-second heart rate data-receiving app Cardio Training (Angelfmarcos) used in this study was acquired from the Android platform. The equipment adopted in this study included 3 indoor aerobic fitness equipment: treadmill, upright stationary bike, and elliptical machine ( Figure 1). These types of equipment were widely demonstrated in the previous exercise protocols and proved to be ideal and safe for aerobic training [2,10,[47][48][49].

Before the Trial
The study was approved by the Institutional Review Board of the National Cheng Kung University Hospital (IRB number: B-ER-106-134). All participants gave written consent to participate in the trial and were provided a detailed explanation of the complete research protocol before the commencement of the study. All participants were given the option to voluntarily withdraw from the trial at any time during the study.
Polar H7 chest-strapped heart rate monitors and wrist-strapped optical fitness trackers were fixed onto the participants by the researcher according to the manufacturer instructions. Next, the broadcast heart rate mode of the optical fitness trackers was activated by the researcher simultaneously. Data transmission to the tablets or mobile phones was then checked.

Exercise Protocol
Initially, participants were asked to be seated quietly for 15 min to record their resting heart rates (HR rest ) using the Polar H7 heart rate monitors. The general formula (220−age in years) was used for calculating the maximal heart rate (HR max ) of each individual. Based on the HR rest and HR max , a personalized moderate exercise intensity was determined for each participant. This was defined by 40% to 60% of heart rate reserve, which is the difference between HR max and HR rest [50]. Finally, participants were led to the exercise area and shown the proper usage and adjustment of the specific fitness equipment.
To evaluate the heart rate detection accuracy of the test devices during different activities, participants were instructed to perform a sequence of sedentary and aerobic exercises [2,10]. The sequence was divided into phases, and heart rates were recorded using the Cardio Training app at each phase. The participants were initially guided to adjust the workout level of equipment accordingly to prevent exhaustion before the end of the trial. Specifically, the measurement began with the participants seated (rest sitting), which represented a typical sedentary behavior. Next, participants were asked to walk on the treadmill for 6 min (the warm-up phase) before engaging in more vigorous exercises. Every period of the exercise phase lasted for 6 min. The step-by-step protocol is presented in Figure  1. Rest sitting time was given to the participants between each phase, during which the heart rate measurement would continue.
During the exercise phases, participants were encouraged to maintain moderate exercise intensity. Real-time feedback and instructions were given by the researcher verbally as guided by the heart rate data acquired from the Polar H7 heart rate monitor. Except in circumstances where the participant deviated from moderate exercise intensity, in which the resistance level was adjusted accordingly, no further intervention by the researcher was made during the entire trial.

Statistical Analyses
Using the Cardio Training app, the second-by-second heart rate data generated from the trials were exported as CSV files. A total of 2161 readings, corresponding to 2161 seconds (including the first reading at the beginning of the protocol), were obtained and recorded for each participant. Compared with previous studies, in which heart rate measurements were less frequent (ie, every 15 seconds/every minute or only at the end of each exercise phase) [11,15,16], the statistical results produced from the current dataset are likely to be more representative because they enabled the researchers to discern some potential outlier readings. To compare the accuracy of test devices, various statistical methods were chosen based on recommendations from relevant studies [2,10,26,38,51]. All statistical tests were performed using SPSS 18.0 (IBM) and MedCalc statistical software (MedCalc).

Reliability
To compare the reliability between the criterion measurement device (Polar H7) and the 2 test optical fitness trackers, 3 reliability tests were used, namely the Lin concordance correlation coefficient (CCC), Pearson product moment correlation coefficient (PPMCC), and ICC tests (two-way mixed, single measures, and absolute agreement). Discrepant standards were used for interpreting the results of the reliability correlation tests. For instance, Gillinov et al [2] set the CCC value greater than 0.80 to represent acceptable reliability, whereas Boudreaux et al [13] set ICC values from 0.60 to 0.75 to represent moderate reliability and from 0.75 to 0.90 to indicate superior reliability. Moreover, other studies on applied sports science have proposed a slightly different version of interpreting ICC values: values between 0.50 and 0.75 indicated moderate reliability, whereas other thresholds were the same [52]. This study used all 3 of the aforementioned reliability tests.

Analysis of Paired Difference
Paired absolute differences from mean absolute error (MAE) and mean absolute percentage error (MAPE) were determined to reveal the differences between the criterion measurement and measurements generated by the test devices among respective age groups and during different phases of the exercise (MAPE is calculated by subtracting the HR readings from the Mi or Garmin from the Polar H7 and then dividing by the Polar H7). Results with error values below 10% were considered reliable [13].

Bland-Altman Analysis
To determine the agreement of the criterion measurement and measurements generated by the optical fitness trackers, Bland-Altman analysis was applied to explore the mean bias and 95% CI limits of agreement. The results from different age groups and during different phases of the exercise were analyzed and represented graphically.

Reliability of Examined Devices
The results of MAE, MAPE, and correlation tests from both the Young and Senior groups are shown in Tables 1 and 2. In the Young group, the Garmin device achieved MAPE values of less than 10% in all the conditions tested (Table 1), indicating that overall, the heart rate readings produced by the Garmin device were reliable [2,13]. By contrast, whereas the Xiaomi device generally achieved MAPE values of less than 10%, it did not do so during cycling and elliptical phases ( Table 1), suggesting that the reliability of the Xiaomi device was likely influenced by the types of activities performed.
In the Senior group, the performances of both test devices during different activities were reliable (MAPE values below 10%, Table 1). Notably, the MAPE values achieved by the Xiaomi device were, on average, higher than those produced by the Garmin device, indicating that the Xiaomi product was overall less reliable than the Garmin one. However, the standard deviation of MAPE achieved by the Garmin device was higher in the Senior group (SD Senior =10.49%) than in the Young group (SD Young =6.9%; Table 1), suggesting that the reliability of the Garmin device was likely affected by age differences and that it became less reliable in the older population.  The data revealed that the Garmin device achieved CCC values above the designated threshold (0.80) in both age groups ( Table  2), suggesting that it was generally accurate. By contrast, the Xiaomi device failed to achieve overall CCC values above the designated threshold in both age groups (  Table 2), indicating that its accuracy was also likely influenced by age differences. Taken together, these data suggest that the Garmin device, in general, produced more reliable and accurate heart rate readings than the Xiaomi one. To observe the overall trends and identify any apparent discrepancies in the correlation in different situations, each phase within the exercise sequence was plotted separately and color coded. The overlaid datasets of the different groups are represented in the scatter gram in Figure 2. Notably, the correlation of certain activities, such as cycling, was found to deviate from the criterion measurements much more frequently than activities such as walking. This was further confirmed using the Bland-Altman analysis (Table 1; see Bland-Altman Analysis).

Bland-Altman Analysis
Bland-Altman plots indicating the mean difference in heart rate detection between Garmin or Xiaomi and Polar H7 criterion measure and levels of agreement with 95% CIs for the Young and Senior groups are illustrated in Figure 3. The complete Bland-Altman analysis dataset is presented in Table 1 (the Bland-Altman plot for each activity phase is provided in Multimedia Appendix 1). The data indicated that both test devices achieved relatively higher variations during cycling phases compared with other activities (Table 1). These results suggest that both devices tended to underreport heart rates in certain situations, consistent with previous observations [16,24]. Notably, the Xiaomi device significantly underestimated heart rates during cycling and elliptical phases in the Young group (−13.4 bpm and −13.3 bpm, respectively). Moreover, the differences between the upper and lower limits during the recovery phase (rest sitting between active phases) were greater than those during the resting phase (rest sitting in the beginning; Table 1). This implies that the variation of differences was greater at the transitional phases in which participants changed their activities from dynamic exercise to recovery, and thus, the degree of errors might decrease gradually if the participants stay in the rest position.

Comparison of Correlation Tests
Various combinations of correlation tests are frequently adopted in evaluating the reliability or validity of examined devices [35]. As such, 3 independent statistical tests were employed in this study to compare whether the results from different correlation tests would deviate.
The obtained results (Table 2) revealed that the PPMCC test might compute a higher correlation coefficient than the CCC and ICC tests. The results of all the phases were quite identical; for example, the maximum difference was less than 0.01 (0.7258 and 0.7341 for Mi Band 2 in the Senior group). However, the difference between CCC or ICC and PPMCC was more obvious for activities; for example, a higher deviation was noted for activities such as cycling and elliptical exercise.

Principal Findings
In line with previous studies [2,11,16,53], the combined results from this study indicated that both the Garmin and Xiaomi devices generally provided accurate heart rate readings. Both devices were also considered reliable in heart rate measurements with overall MAPE values below the 10% threshold. Notably, even though both devices achieved acceptable overall correlations in both age groups, they showed a tendency to modestly underestimate heart rates in many situations, as revealed by the Bland-Altman analysis. Similar findings were also reported in previous studies [11,12] and could represent a general characteristic of optical heart rate fitness trackers.
However, it is worth noting that significant discrepancies in device accuracy remained apparent between different physical activities. In general, these devices would be more accurate during sedentary behaviors such as sitting compared with active exercise [2]. Indeed, a previous study on a number of commercial wearable activity monitors have found that most devices exhibited low ICC values (r<0.5) when the activity intensity exceeded 100 watts in graded cycling exercise [13]. Similarly, our data revealed that the test devices generally had lower correlation coefficients and higher degrees of deviation during cycling and elliptical exercises compared with other activities.
In addition to activity intensity, several other studies have identified that motion artifacts during exercise were negatively correlated with the accuracy of PPG heart rate-monitoring systems [32,38,46,[54][55][56]. For example, in an experiment conducted by Gillinov et al [2], the optical devices exhibited more accuracy for exercise with fewer arm motion artifacts (cycling and elliptical exercise with no arms movement). It is somewhat surprising that the data collected in this study indicated the opposite (as cycling produced less motion artifact than running). Nevertheless, Benedetto et al [12] found that the Fitbit charge 2 had poor ICC values (r=0.21) and underestimated the actual heart rate values when performing stationary cycling. Without further conclusions, users should be cautious when relying on optical heart rate readouts during various physical activities. Taken together, this study provides supporting evidence for a negative correlation between activity type and the accuracy of optical heart rate sensors but not between motion artifacts and the accuracy of optical heart rate sensors [2,13,24]. The precise mechanisms for such correlations currently remain unclear.
The profoundly expanding aging population worldwide is creating challenges for all sectors in the society. Promoting health condition of the older adult population and motivating them to engage in regular physical activity have become essential [57]. The adoption of new technology such as using health-related informatics technology (such as apps) or wearable fitness trackers is increasing [20,58,59], and the benefits are also observed in the senior population [20,60]. The fitness trackers validated in this study appear to exhibit similar accuracy for heart rate detection among different age groups.
Given its more thorough data acquisition method, this study had identified certain unexpected outliers. As shown in Figure  4, these extreme readings were unexpected, unpredictable, and transient. It is likely that these extreme readings did not represent the true heart rate values and that their displays were technical faults of the devices or the detection approach. Nonetheless, these random (or untrue) readings can skew the overall dataset and falsely represent the heart rate of an individual. Because these extreme heart rate readings were only observed for a short period, detecting these deviations while examining the heart rate readings every 15 seconds, every minute, or only at the end of the exercise, as in earlier studies, is difficult [11,15,16]. Given the transiency of such extreme readings, it is therefore recommended that future studies on optical heart rate sensors adopt a second-by-second approach demonstrated here and previously [12] to identify the outliers. Previous studies have proposed the use of different statistical methods to analyze the data correlation. These include the MAPE test, the Bland-Altman analysis, the correlation PPMCC, ICC, and CCC tests [2,12,13,15,53]. To minimize the insufficiencies of individual statistical tests, this study examined the second-by-second heart rate readings using all of the mentioned correlation tests. Our results showed that when given the same dataset, PPMCC tests would typically derive higher values than ICC or CCC tests. Although all of the correlation coefficients have previously been adopted in other studies on optical devices, future research should exercise caution when selecting correlation tests and interpreting test results. That said, ICC and CCC should nonetheless be the preferred tests, as they were initially used to assess the interrater reliability in related validation studies [61,62]. Sartor et al [38] also supported the use of the CCC test for validating wrist-based heart rate monitors. Another study has proposed standardization of exercise protocols to ensure that the aggregate data were reproducible [51]. Thus, a standard set of examining methods and statistical analyses should be developed and adopted in future validation studies of optical heart rate sensors.
In conclusion, this study revealed that both the Garmin and Xiaomi optical heart rate sensors were capable of producing fairly accurate heart rate readings for both young and older adults. In particular, these devices achieved better accuracy during sedentary behaviors compared with physical activities. The heart rate reading accuracy of both devices was influenced by different types of physical activities. Consistently, the results echoed the previously reported tendency for heart rate underestimation during cycling and elliptical training in both of the devices. Notably, both devices exhibited the tendency to transiently display erroneous extreme readings. Thus, cautions should be exercised when using wrist-strapped fitness trackers to monitor the real-time heart rate during aerobic exercises.

Limitations
This study was limited by several factors. First, the test devices were chosen because of their popularity in Asia and the availability of the broadcasting heart rate mode on these devices. However, different brands would usually be integrated with different PPG modules or algorithms, which could lead to discrepancies among the different optical heart rate devices [2,11]. This makes direct interpretations of findings on other optical heart rate devices using the current results more difficult. Although this study strived to retrieve the second-by-second data, the heart rate signals derived from various devices were complex, and the time lag problem existed between the investigational and reference devices [38]; in addition, owing to the trade secrets pertaining to the PPG signal-processing algorithms and the receiving apps, we could only assume that the second-by-second data are from the nearest previous beat-to-beat waveform signal to represent the heart rate readings. Nevertheless, the PPG sensor provided satisfactory readings when it was worn on the wrist than on other body parts. Second, the exercise intensity in this study was set at a submaximal level because of the various physical conditions of the participants. Thus, performance of these examining devices during more vigorous intensity exercises remains to be examined. In addition, this study only selected healthy participants, that is, participants without any cardiovascular diseases (eg, coronary artery disease or abnormal heart rhythms) or neurological disorders (eg, Parkinson disease or essential tremor) because the abnormal heart rate might interfere in the accuracy of comparison [63,64]. Hence, the results cannot be generalized to the overall older adult population. The validity of PPG fitness trackers for a population with major disorders, such as patients with cardiac disorders, requires further investigation.

Suggestions
Future research on these topics should benefit from the standardization of the exercise protocol, selected statistical methods, and the threshold of acceptable accuracy. This will allow for better cross-study comparisons and more accurate interpretations [51]. Second, future studies can incorporate more participants with various health conditions to increase the representativeness of the cohort. Conducting multiple trials for the same cohort will control variability. This will also help identify erroneous readings, especially when they fall within the physiological range. For similar reasons, the second-by-second data acquisition method presented in this study should be adopted in all future studies. This will also help address the mechanisms of those conceivably erroneous displays. Third, future testing should include more contextual activities, such as outdoor walking, running, and cycling, to better mimic real-life events. This will allow for better comparisons of device performances under different settings.

Conclusions
Overall, the results of this study indicate that both the Garmin and Xiaomi optical heart rate sensors exhibit acceptable heart rate-sensing accuracy for yellow skin tone population (Fitzpatrick skin type III or IV). Both devices perform similar to the Polar H7 chest-strapped heart rate monitor. The results also indicate that the sensing reliability of both the Garmin and Xiaomi devices can be influenced by different types of physical activities and that the Garmin device generally outperformed the Xiaomi device. The accuracy of both devices was not significantly affected by the age of users which implies that both devices are suitable for use in older adults. This has significant implications for the increasing aging population because PPG fitness trackers are inexpensive and use a noninvasive technology to provide information regarding various parameters and they have a great potential for telemedicine use considering remote or home health monitoring, assisting the older adult population to monitor their health [32].
The accuracy levels of both devices were negatively correlated with the level of activity intensity. For both devices, the measurement accuracy deteriorated in individuals while cycling. For unknown reasons, this study also reports the occurrence of extreme errors in these heart rate-sensing devices. These relevant findings imply that users or exercise practitioners should be cautious when using wrist-strapped fitness trackers to monitor exercise performance.