This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR mhealth and uhealth, is properly cited. The complete bibliographic information, a link to the original publication on http://mhealth.jmir.org/, as well as this copyright and license information must be included.
It has become possible for the new generation of consumer wristbands to classify sleep stages based on multisensory data. Several studies have validated the accuracy of one of the latest models, that is, Fitbit Charge 2, in measuring polysomnographic parameters, including total sleep time, wake time, sleep efficiency (SE), and the ratio of each sleep stage. Nevertheless, its accuracy in measuring sleep stage transitions remains unknown.
This study aimed to examine the accuracy of Fitbit Charge 2 in measuring transition probabilities among wake, light sleep, deep sleep, and rapid eye movement (REM) sleep under free-living conditions. The secondary goal was to investigate the effect of user-specific factors, including demographic information and sleep pattern on measurement accuracy.
A Fitbit Charge 2 and a medical device were used concurrently to measure a whole night’s sleep in participants’ homes. Sleep stage transition probabilities were derived from sleep hypnograms. Measurement errors were obtained by comparing the data obtained by Fitbit with those obtained by the medical device. Paired 2-tailed
Sleep data were collected from 23 participants. Sleep stage transition probabilities measured by Fitbit Charge 2 significantly deviated from those measured by the medical device, except for the transition probability from deep sleep to wake, from light sleep to REM sleep, and the probability of staying in REM sleep. Bland-Altman plots demonstrated that systematic bias ranged from 0% to 60%. Fitbit had the tendency of overestimating the probability of staying in a sleep stage while underestimating the probability of transiting to another stage. SE>90% (
Our analysis shows that Fitbit Charge 2 underestimated sleep stage transition dynamics compared with the medical device. Device accuracy may be significantly affected by perceived sleep quality (PSQI), WASO, and SE.
Having enough restorative sleep is essential for physical and mental health [
As consumer sleep-monitoring wristbands continue to gain popularity, their limitation in measurement accuracy raised wide concerns on the quality of data collected using these devices [
Although the main body of validation studies has been dominantly focused on polysomnographic metrics (eg, TST, WASO, sensitivity, and specificity) [
This study aimed to examine whether it would accurately measure sleep stage transitions (the transition probabilities among waking, light, deep, and REM sleep) using Fitbit Charge 2. Despite the abundant validation studies, the accuracy of consumer wristbands in measuring sleep stage transition has not been investigated. We also examined the factors that are associated with the measurement errors on sleep stage transition probabilities. Previous validation studies on other types of wearable devices found that device accuracy could vary as a function of the underlying sleep patterns, the population studied, and even how the measurand was defined [
Sleep stage transition dynamics. The W, L, D, R in the subscripts denotes the abbreviation of wake, light sleep, deep sleep, and rapid eye movement sleep.
We recruited participants by distributing posters around the campus of The University of Tokyo. In total, 38 people registered interest through a Web-based form, of whom 28 (74%) were eligible to participate in the study. The inclusion criteria required that the participants were adults (age>18 years), were free of diagnosed chronic conditions, and were able to attend a briefing before the data collection phase. This research was approved by the ethical committee of the University of Tokyo. All participants provided informed consent.
A face-to-face briefing was held with each participant individually before the data collection phase. In this meeting, we installed the Fitbit app on participants’ mobile phones and provided verbal instructions on how to use the devices and how to synchronize the Fitbit device with its mobile phone app. Participants were provided with the following items for data collection: a Fitbit Charge 2, a medical device named Sleep Scope, electrodes, chargers, and manuals. At the end of the briefing, participants were asked to fill in a PSQI questionnaire [
After the briefing, participants measured their sleep using both devices for 3 consecutive nights in their homes to ensure that Fitbit Charge 2 was evaluated in an ecologically valid setting. They were asked to wear the Fitbit on the nondominant wrist during data collection. All participants received a monetary reward when they returned the devices after data collection.
In this study, we collected sleep data concurrently using Fitbit Charge 2 and a medical device. Fitbit Charge 2 (Fitbit Inc) is a wearable activity wristband with an embedded triaxial accelerometer. It estimates sleep stages for each 30 second period by integrating a user’s movement and heart rate data. With advances in software and hardware, Fitbit Charge 2 has overcome some problems of previous models, and it is able to measure TST and SE with good accuracy [
In the data collection phase, participants tracked their sleep for 3 consecutive nights in their homes. Following the common practice in sleep science, we analyzed the second night for each participant to remove
Fitbit sleep data were retrieved through the application program interface (API) of Fitbit. Fitbit Charge 2 provides sleep data at 2 levels through public API. The
The data of the medical device were analyzed by the Sleep Well Company, using proprietary automatic scoring algorithms, followed by epoch-by-epoch visual inspection by specialists on the basis of established standards [
To examine the effect of user-specific factors on measurement accuracy, we also collected data on the factors listed in
A full list of user-specific factors.
Factors | Data type | Data collection method | Cut-off threshold |
Age (years) | Ordinal | Self-reported | 25 |
Sex | Nominal | Self-reported | Female or male |
PSQIa | Ordinal | PSQI questionnaire | 5 |
TSTb (min) | Continuous | Sleep scope (medical device) | 360 |
WASOc (min) | Continuous | Sleep scope | 30 |
SOLd (min) | Continuous | Sleep scope | 30 |
SEe, % | Continuous | Sleep scope | 90.0 |
Light sleep, % | Continuous | Sleep scope | 65.0 |
SWSf, % | Continuous | Sleep scope | 20.0 |
REMg, % | Continuous | Sleep scope | 20.0 |
Continuous | Sleep scope | 90 |
aPSQI: Pittsburgh Sleep Quality Index.
bTST: total sleep time.
cWASO: wake after sleep onset.
dSOL: sleep onset latency.
eSE: sleep efficiency.
fSWS: slow wave sleep.
gREM: rapid eye movement sleep.
hTavg: average sleep cycle.
The overall goal of the analysis was two-fold. We aimed to examine the accuracy of Fitbit Charge 2 in measuring sleep stage transitions compared with a medical device. We were also interested in the associations of user-specific factors with the measurement accuracy of Fitbit Charge 2. All statistical significance levels reported were 2 sided, and statistical analysis was performed using R statistical software version 3.5.3 (The R Foundation)[
First, descriptive statics of sleep parameters were derived from the medical data. Paired 2-tailed
The calculation of sleep stage transition probabilities.
The calculation of absolute percent error.
The absolute percent error
To examine the effect of user-specific factors on absolute percent error, the dataset was divided into 2 subsets according to the cut-off threshold values listed in
A total of 28 young adults without chronic diseases participated in the study. A total of 5 participants were excluded from analysis because of failure to obtain
Average sleep stage transition probabilities (%) and results of paired
Sleep stage | Wake | Light | Deep | REMa | |
Medical | 53.7 (44.0-63.3) | 43.6 (33.8-53.4) | 0.2 (0.0-0.4) | 2.6 (1.5-3.7) | |
Fitbit | 89.8 (81.2-98.3) | 5.5 (4.3-6.7) | 0.2 (0.0-0.5) | 0.2 (0.0-0.5) | |
<.001 | <.001 | .83 | <.001 | ||
Medical | 2.6 (2.0-3.3) | 92.6 (90.9-94.4) | 3.9 (2.1-5.8) | 0.8 (0.7-0.9) | |
Fitbit | 0.5 (0.3, 0.6) | 97.8 (97.6-98.1) | 1.1 (0.9-1.3) | 0.5 (0.3-0.7) | |
<.001 | <.001 | .005 | .02 | ||
Medical | 2.5 (0.7-4.3) | 57.7 (43.8-71.6) | 35.5 (22.6-48.4) | 0.0 (0.0-0.0) | |
Fitbit | 0.2 (0-1.8) | 3.8 (2.9-4.6) | 94.9 (93.4-96.4) | 1.1 (0.4-1.8) | |
.02 | <.001 | <.001 | .002 | ||
Medical | 2.0 (1.6-2.4) | 0.9 (0.7-1.2) | 0.0 (0.0-0.0) | 96.9 (96.5-97.5) | |
Fitbit | 0.1 (0.0-0.2) | 1.7 (0.7-2.6) | 1.2 (0.3-2.2) | 96.9 (96.0-98.0) | |
<.001 | .14 | .01 | >.99 |
aREM: rapid eye movement.
In line with previous studies [
Bland-Altman plots assessing the level and limits of agreement between Fitbit Charge 2 and medical device on the transition probabilities from rapid eye movement (REM) sleep to light sleep, from light sleep to REM sleep, and the probability of staying in REM sleep. The dashed line in the middle represents the mean difference, whereas the upper and lower dashed lines represent the upper limit of agreement and the lower limit of agreement.
Bland-Altman plots assessing the level and limits of agreement between Fitbit Charge 2 and medical device on the probability of staying in light sleep, in deep sleep, and in wake.
Bland-Altman plots assessing the level and limits of agreement between Fitbit Charge 2 and medical device on the transition probabilities from wake to light sleep, from wake to rapid eye movement (REM) sleep, from wake to deep sleep, from light sleep to wake, from light sleep to deep sleep, from deep sleep to wake, from deep sleep to light sleep, from deep sleep to REM sleep, from REM sleep to wake, and from REM sleep to deep sleep.
The results of Wilcoxon signed–rank test showed that good subjective sleep quality indicated by PSQI as lower than 5 was associated with decreased errors in the probability of staying in deep sleep stage (PSQI<5, 132.1±173.1%; PSQI≥5, 346.8±250.0%;
Wake time longer than 30 min was associated with increased errors in transition probability from light sleep to REM sleep (WASO≥30, 265.8±176.5; WASO<30, 103.9±49.1%;
SE above 90% was associated with increased measurement errors in transition probability from REM sleep to light sleep (SE>90, 107.1±53.2%; SE≤90%, 55.9±40.4%;
In addition, age below 25 years (age<25, 7.9±5.4%; age≥25, 3.1±2.3%;
No significant associations were found between measurement errors of Fitbit and other factors, including sex, TST, SOL, light sleep ratio, REM sleep ratio, and Tavg.
We have demonstrated a numerical comparison on sleep stage transition probabilities between Fitbit Charge 2 and the medical device. The level and limits of agreement between the 2 types of devices were illustrated using Bland-Altman plots. The results of Wilcoxon signed–rank test were presented to demonstrate the associations between user-specific factors and measurement errors. This study generated 2 main findings. First, we found that Fitbit Charge 2 underestimated sleep stage transition dynamics compared with the medical device. Second, device accuracy was mainly associated with 3 user-specific factors: subjective sleep quality measured by PSQI, WASO, and SE.
Sleep stage transition analysis has been used to characterize sleep continuity and the temporal stability of non-REM and REM bouts in sleep science [
Sleep stage transition is the result of complex interactions among many brain regions. Not being able to detect markers in brainwaves, such as k-complexes [
A unique aspect of this study is that we also examined the effect of user-specific factors and found multiple associations. Our analysis showed that subjective sleep quality measured by PSQI, wake after WASO, and SE were significantly strong predictors of measurement errors in sleep stage transition probabilities. Age, SOL, and deep sleep ratio were significant but weak predictors, whereas sex, TST, light sleep ratio, REM sleep ratio, and average sleep cycle were not associated with the measurement errors of Fitbit.
Despite the finding from previous validation studies that poor sleep quality is associated with deteriorated performance of sleep monitoring devices in measuring polysomnographic sleep metrics [
In addition, age was found to be a significant but weak predictor of measurement errors. Participants in the age range of 25 to 30 had decreased measurement errors in the probability of staying in light sleep stage compared with those younger than the age of 25. As age has been widely recognized as a significant factor that alters sleep patterns [
Our findings complement those of previous validation studies on consumer wristbands for sleep tracking in general. Fitbit Charge 2 has demonstrated satisfying performance in measuring TST and SE, but it remains incapable of classifying sleep stages with good accuracy [
This study is subject to the following limitations. First, the participants represent a young healthy population that was free of sleep disorders or chronic diseases. Therefore, the results cannot be generalized to older or clinical populations. Second, the data collection phase was not longitudinal in nature, and only 1 night of sleep from each participant was analyzed. Thus, the results may fail to count intrapersonal variations. Third, the list of potential affecting factors investigated in this study was not exhaustive, and it may be affected by restricted sampling. Further research should address these limitations by including a diverse population, extending data collection duration, and examining the effect of other potential predictors of device accuracy.
We have demonstrated that Fitbit Charge 2 significantly underestimated sleep stage transition dynamics compared with the medical device and that measurement accuracy could be mainly affected by perceived sleep quality, sleep continuity, and SE. Despite the positive trend of enhanced accuracy for the latest consumer wearable sleep trackers, the limitation of these devices in detecting sleep stage transition dynamics needs to be recognized. As an outcome measurement tool, Fitbit Charge 2 may not be suited for research studies related to sleep stage transitions or for health care decision making. Further research should focus on enhancing the accuracy of these consumer wristbands in measuring not only polysomnographic parameters but also sleep stage transition dynamics.
application programming interface
polysomnography
Pittsburgh Sleep Quality Index
rapid eye movement
sleep efficiency
sleep onset latency
slow wave sleep
total sleep time
wake after sleep onset
This study was sponsored by a JSPS KAKENHI Grant-in-Aid for Research Activity Start-up (Grant Number 16H07469) and a JSPS KAKENHI Grant-in-Aid for Early Career Scientists (Grant Number 19K20141).
None declared.