This is an openaccess article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR mHealth and uHealth, is properly cited. The complete bibliographic information, a link to the original publication on http://mhealth.jmir.org/, as well as this copyright and license information must be included.
Data collected by an actigraphy device worn on the wrist or waist can provide objective measurements for studies related to physical activity; however, some data may contain intervals where values are missing. In previous studies, statistical methods have been applied to impute missing values on the basis of statistical assumptions. Deep learning algorithms, however, can learn features from the data without any such assumptions and may outperform previous approaches in imputation tasks.
The aim of this study was to impute missing values in data using a deep learning approach.
To develop an imputation model for missing values in accelerometerbased actigraphy data, a denoising convolutional autoencoder was adopted. We trained and tested our deep learning–based imputation model with the National Health and Nutrition Examination Survey data set and validated it with the external Korea National Health and Nutrition Examination Survey and the Korean Chronic Cerebrovascular Disease Oriented Biobank data sets which consist of daily records measuring activity counts. The partial root mean square error and partial mean absolute error of the imputed intervals (partial RMSE and partial MAE, respectively) were calculated using our deep learning–based imputation model (zeroinflated denoising convolutional autoencoder) as well as using other approaches (mean imputation, zeroinflated Poisson regression, and Bayesian regression).
The zeroinflated denoising convolutional autoencoder exhibited a partial RMSE of 839.3 counts and partial MAE of 431.1 counts, whereas mean imputation achieved a partial RMSE of 1053.2 counts and partial MAE of 545.4 counts, the zeroinflated Poisson regression model achieved a partial RMSE of 1255.6 counts and partial MAE of 508.6 counts, and Bayesian regression achieved a partial RMSE of 924.5 counts and partial MAE of 605.8 counts.
Our deep learning–based imputation model performed better than the other methods when imputing missing values in actigraphy data.
An accelerometerbased actigraphy device can measure the movement of the person wearing the device by capturing acceleration in a single axis or in multiple axes of motion. By mounting an actigraphy device on the wrist, ankle, or waist, researchers or physicians can measure the amount of movement or movement patterns. One study [
Because data recorded over a few days to a few months are usually required to be able to analyze activity patterns, studies have frequently faced the issue of participant adherence to use [
To overcome this issue, statistical models have been suggested to impute missing data [
Recent studies [
This study was approved by the Ajou University Hospital institutional review board (AJIRBMEDEXP17470). Before collecting actigraphy data from patients at Ajou University Hospital, we received their informed consent (Korean Chronic Cerebrovascular Disease Oriented Biobank, KCCDB). The other two data sets used in this study, the National Health and Nutrition Examination Survey (NHANES) [
This study consisted of four phases. First, we collected activity data from three different actigraphy data sets. Second, complete data for each day were selected from the data sets and preprocessed to generate artificially corrupted data. Third, models were constructed from the data, including our deep learning–based imputation model. Finally, models were evaluated with performance indices. Details of the procedures are shown in
Overview of the study and data where n indicates number of records (days). IV: intradaily variability; KCCDB: Korean Chronic Cerebrovascular Disease Oriented Biobank; KNHANES: Korea National Health and Nutrition Examination Survey; MVPA: moderatetovigorous physical activity; NHANES: National Health and Nutrition Examination Survey; PRMSE: partial root mean square error; PMAE: partial mean absolute error; RMSE: root mean square error.
Three actigraphy data sets were used for this study—NHANES, accelerometerbased actigraphy data collected over four years (20032006) from 14,482 individuals living in the United States; KNHANES, accelerometerbased actigraphy data collected over two years (20142015) from 1768 people living in South Korea; and KCCDB, accelerometerbased actigraphy data were collected over two years (20142015) from 177 patients who had visited Ajou University Hospital for evaluation or treatment of cerebrovascular disease.
The NHANES data set was collected using a uniaxial accelerometerbased actigraphy device (ActiGraph AM7164, ActiGraph LLC) which gathered only zaxis data. The KNHANES data set, though collected using a triaxial accelerometerbased actigraphy device (ActiGraph GTX3+, ActiGraph LLC), also consisted of only zaxis data (only these data were made available to the public). In contrast, the KCCDB data set was collected using a triaxial accelerometerbased actigraphy device (Fitmeter, Fit.Life Inc). The triaxial data were aggregated into a single magnitude vector using the formula
First, because the KCCDB data set has a higher sampling rate (0.1 Hz) than those of the other data sets (0.016 Hz), the KCCDB data were downsampled by averaging the values for each minute. As a result, each record consisted of 1440 values per day.
In previous studies, a missing interval has been defined as a period (of 20, 30, or 60 minutes, depending on the study) over which zero values are continuously repeated [
Finally, we extracted the data from 9 AM to 9 PM, which is the most active period for humans [
(a) Frequency of missing data intervals found in the NHANES data set. The interval of approximately 30 minutes occurred most frequently. (b) Example of a complete data record and of a record with missing data interval.
Mean imputation is a method of replacing a missing value with the mean value from the other instances of valid data at that time [
To compensate for the disadvantage of a single imputation method—where missing values are replaced with a single value—the multiple imputation method generates several data sets and the results are combined into a single result to replace the missing values. In this study, the multiple imputation by chained equation approach (also called fully conditional specification) was used for multiple imputation. (1) All missing values of each variable were filled using the mean imputation method. (2) The regression equation, in which the dependent variable is the variable to be imputed and the independent variables are the other variables surrounding the dependent variables, was developed. Then, missing values were replaced by a value estimated by the regression equation [
The process was the same as the zeroinflated Poisson model, but Bayesian multiple imputation utilized Bayesian linear regression. Because the Bayesian model aims to find the parameter for the posterior distribution and take a sample from the estimated distribution, the imputed values of this approach can be negative, which cannot exist for the units defined for these devices. Hence, the negative values were replaced with zero values. Bayesian regression algorithms were written in Python.
An autoencoder is an unsupervised deeplearning method. Its aim is to make its output
Because
A denoising autoencoder is a type of autoencoder that restores noisy input
A convolutional neural network is a deeplearning method commonly adopted for analyzing images [
Our model, the zeroinflated denoising convolutional autoencoder, consisted of an autoencoder that encoded and decoded the data using a convolutional neural network and a unique activation function designed for the zero distribution at the last layer. The zeroinflated denoising convolutional autoencoder received corrupted data as input, then compressed and recovered these data using a convolutional autoencoder, as shown in
We used the PyTorch (version 1.4.1) framework to construct the deep learning model [
Hyperparameters, filter size (m=20 and 30) and size of the latent vector (k=40, 60, and 80), were determined by a grid search. The other hyperparameters (the number of layers, q, and number of filters, n) were gradually increased and the bestperforming values were selected. Stride was set adaptively according to the latent vector size.
For training and selecting the final hyperparameter settings, the training data set was used. The hyperparameters of the zeroinflated denoising convolutional autoencoder were tuned using 10fold crossvalidation. Hyperparameters exhibiting the lowest root mean square error (RMSE) were selected as the final hyperparameter settings.
When training the zeroinflated denoising convolutional autoencoder model, we used RMSE as a loss function to provide feedback to modify the weights. The loss between the original data and restored data was calculated for the entire record (imputed and original parts). Applying the loss function to all values allowed the zeroinflated denoising convolutional autoencoder to learn to impute the missing intervals and to reconstruct the other observed parts.
Model architecture for zeroinflated denoising convolutional autoencoder consisting of encoder with 5 convolutional layers in which the filter size and stride decrease and the number of feature maps increases and decoder with 5 transconvolutional layers in which hyperparameters are symmetrically the same as those used in the encoder.
Performance of the imputation methods were evaluated using two metrics on the imputed portion of the outputs. Partial RMSE
and partial MAE
where
were calculated. In the equations,
We calculated the standard deviation
and the intradaily variability index
where
Intradaily variability index is a nonparametric index of the circadian rhythm and represents the fragmentation of activity. Index values range from 0 to 2, and higher values indicate higher variability.
To evaluate reconstruction of the original variability in the data, we calculated RMSE of the standard deviation,
and RMSE of the intradaily variability,
where
We also evaluated the RMSE in the missing intervals of a moderatetovigorous physical activity measure that was derived from the data. Moderatetovigorous physical was used to represent activity intensity. We applied a cutoff=1267 counts [
where, unlike the RMSE of intradaily variability or standard deviation, the duration of moderatetovigorous physical activity was calculated for only missing intervals.
The mean value at each minute was determined from the NHANES training set. For external validation, a model was constructed and evaluated for each data set without dividing them into training and testing sets.
The multiple imputationbased models were constructed without dividing the data set into training, validation, and testing data sets. These models utilize the expectation–maximization algorithm, which requires the entire data set and fills in the missing values with values inferred from existing values in other data records. After imputation was performed, the performance was estimated using the NHANES test data set only. External validation was performed using the same process that was used to evaluate mean imputation.
The performance of zeroinflated denoising convolutional autoencoder was evaluated using the model generated by the training and validation sets of NHANES to impute the test set. For external validation, the performance was evaluated by applying the model trained on the NHANES data set to the external validation data sets without retraining.
To visualize the data and results, both Python (matplotlib, version 3.0.0; seaborn, version 0.9.0) and R (ggplot2) were used [
After 10fold crossvalidation for hyperparameter tuning, we selected the hyperparameter conditions with the lowest mean RMSE (
We used the zeroinflated denoising convolutional autoencoder model with a filter length of 30 in the first convolutional layer and a latent vector size of 60 in the subsequent experiments in this study (q=10, n=8, m=30, k=60,
Baseline characteristics of the data sets.
Characteristics  NHANES^{a} (n=12,475)  KNHANES^{b} (n=1768)  KCCDB^{c} (n=177)  
Age (years), mean (SD)  39.04 (22.27)  42.88 (13.04)  74.07 (7.05)  






Male  6077 (48.71)  662 (37.44)  56 (31.63) 

Female  6398 (51.28)  1106 (62.55)  121 (68.36) 
Weight (kg), mean (SD)  75.26 (21.73)  63.35 (11.97)  59.03 (10.04)  
Height (cm), mean (SD)  166.01 (11.72)  163.45 (8.55)  156.96 (8.33)  
BMI (kg/m^{2}), mean (SD)  27.03 (6.56)  23.62 (3.48)  22.66 (7.19)  
Activity (count), mean (SD)  344 (694.23)  433 (586.78)  637 (1121.27) 
^{a}NHANES: National Health and Nutrition Examination Survey data set (device: ActiGraph AM7164; type: uniaxial; sample rate: 0.016 Hz).
^{b}KNHANES: Korea National Health and Nutrition Examination Survey data set (device: ActiGraph GTX3+; type: triaxial; sample rate: 0.016 Hz).
^{c}KCCDB: Korean Chronic Cerebrovascular Disease Oriented Biobank data set (device: Fitmeter; type: triaxial; sample rate: 0.01 Hz).
Result of 10fold crossvalidation.

Hyperparameters  Result  
Experiment  Latent vector size, k  Filter size, m  RMSE^{a} (count), mean  
1  40  20  830.5  
2  40  30  838.0  
3  60  20  858.7  
4  60  30  788.4  
5  80  20  825.1  
6  80  30  831.0 
^{a}RMSE: root mean square error.
Hyperparameter settings for the zeroinflated denoising convolutional autoencoder.
Component layer (order of layer)  Filter, n^{a} ×size (stride)  Feature map output size, n × size  
Input 

1×720  





Convolution (1)  8×30^{b} (2)  8×346 

Convolution (2)  16×20 (2)  16×164 

Convolution (3)  32×10 (2)  32×78 

Convolution (4)  64×10 (1)  64×69 

Convolution (5)  128×10 (1)  128×60^{c} 





Transconvolution (6)  64×10 (1)  64×69 

Transconvolution (7)  32×10 (1)  32×78 

Transconvolution (8)  16×10 (2)  16×164 

Transconvolution (9)  8×20 (2)  8 ×346 

Transconvolution (10^{d})  1 ×30 (2)  1×720 
^{a}number of filters, n.
^{b}filter size, m.
^{c}latent vector size, k, extracted by the encoder.
^{d}number of layers, q.
Examples for the zeroinflated denoising convolutional autoencoder, mean imputation, zeroinflated Poisson regression and Bayesian regression methods on the NHANES test data set are shown in
In addition, we evaluated the RMSE of intradaily variability and moderatetovigorous physical activity for each data set to evaluate imputation. Zeroinflated denoising convolutional autoencoder yielded the lowest RMSE of intradaily variability for both the NHANES and KNHANES data sets with values of 0.047 and 0.037, respectively. In contrast, for KCCDB, zeroinflated denoising convolutional autoencoder had the secondlowest RMSE of intradaily variability with a value of 0.02. Moreover, the RMSE of moderatetovigorous physical activity for zeroinflated denoising convolutional autoencoder was the lowest (13.4 minutes) on KCCDB and the second lowest on the NHANES and KNHANES data sets (12.3 minutes and 12.9 minutes, respectively).
Additional analyses were performed including confidence interval calculation (
Examples of (a) NHANES, (b) KNHANES, and (c) KCCDB data sets for zeroinflated denoising convolutional autoencoder (left), zeroinflated Poisson regression (center), and Bayesian regression (right) with imputed (red) and original (green) intervals within the record (blue). KCCDB: Korean Chronic Cerebrovascular Disease Oriented Biobank; KNHANES: Korea National Health and Nutrition Examination Survey; ZIDCAE: zeroinflated denoising convolutional autoencoder; ZIP: zeroinflated Poisson regression.
Imputation performance results for the comparison methods.
Dataset measurement 
ZIDCAE^{a}  Mean imputation  Zeroinflated Poisson regression  Bayesian 








partial RMSE^{c} (counts)  839.3  1053.2  1255.6  924.5  

partial MAE^{d} (counts)  431.1  545.4  508.5  605.8  

RMSE of SD (counts)  35.1  65.2  69.2  34.2  

RMSE of intradaily variability index  0.047  0.067  0.060  0.071  

RMSE of moderatetovigorous physical activity (minutes)  12.3  16.2  16.2  11.0  







partial RMSE (counts)  672.1  660.0  778.8  824.1  

partial MAE (counts)  396.3  419.7  395.5  555.5  

RMSE of SD (counts)  24.4  26.5  26.0  24.7  

RMSE of intradaily variability index  0.037  0.039  0.040  0.050  

RMSE of moderatetovigorous physical activity (minutes)  12.9  14.7  14.6  12.2  







partial RMSE (counts)  1217.2  1313.2  1638.4  1139.4  

partial MAE (counts)  819.6  1045.8  1161.6  810.7  

RMSE of SD (counts)  27.1  30.8  29.6  27.7  

RMSE of intradaily variability index  0.02  0.036  0.041  0.018  

RMSE of moderatetovigorous physical activity (minutes)  13.4  14.9  14.9  13.6 
^{a}ZIDCAE: zeroinflated denoising convolutional autoencoder.
^{b}NHANES: National Health and Nutrition Examination Survey data set.
^{c}RMSE: root mean square error.
^{d}MAE: mean absolute error.
^{e}KNHANES: Korea National Health and Nutrition Examination Survey data set.
^{f}KCCDB: Korean Chronic Cerebrovascular Disease Oriented Biobank data set.
Our study was the first to attempt to impute activity count data using a deeplearning approach. For activity data with intervals of zero values, in statistical models such as a zeroinflated Poisson distribution, the probability distribution of zeros is assumed to have a specific distribution. In contrast, the zeroinflated denoising convolutional autoencoder model created the distribution of zeros by using a clamped hyperbolic tangent activation function which caused the model to transform negative outputs to zero values. The zeroinflated denoising convolutional autoencoder model can learn the distribution of zeros from the data themselves. The results confirm that this approach performs better than previous approaches.
By testing the performance with an external data set that was not related to the training data set, we found that the model could be generalized. On the KNHANES and KCCDB data sets, our model exhibited better performance than those of the other imputation algorithms. Although our model was trained with the NHANES data set, which was collected with a uniaxial device, we confirmed that the zeroinflated denoising convolutional autoencoder model also worked well on triaxial data (the KCCDB data set). This result indicated that our model did not overfit to the NHANES data set, but instead was able to learn important features in the activity data.
In addition to predicting the missing values, our model was also able to reproduce the variability of activity data. On the NHANES data set, the zeroinflated denoising convolutional autoencoder had the secondlowest RMSE for standard deviation and the lowest RMSE for standard deviation on the external validation data sets. This characteristic is desirable for subsequent analysis using the imputed data set because some activity indices such as intradaily variability use variability in the activity data to evaluate the rhythms of human activity. We compared the RMSE of intradaily variability for the original data and imputed data. The zeroinflated denoising convolutional autoencoder showed a lower RMSE of intradaily variability than those of the other imputation methods. Generally, it seemed that the zeroinflated denoising convolutional autoencoder was able to restore the variability of the original data more accurately than other methods. These results suggest that the zeroinflated denoising convolutional autoencoder model can not only impute the missing data while reflecting the meaningful variability of the activity data in the general population (NHANES and KNHANES data sets), but can also reflect the variability of activity data from patients with cerebrovascular disease (KCCDB data set).
Moderatetovigorous physical activity is an index for evaluating the intensity of activity. We evaluated how well the model restored the duration of the original moderatetovigorous physical activity. With the lowest RMSE of moderatetovigorous physical activity in KCCDB and the second lowest in other data sets, the zeroinflated denoising convolutional autoencoder also demonstrated the ability to restore measures of activity intensity.
Finetuning the approach could be considered to better reflect the unique characteristics of new data if there are enough complete cases available after training. To test whether finetuning could improve imputation performance, we conducted the following ad hoc analysis. The KNHANES data set was split into training, validation, and testing data sets in a 9:1:1 ratio. The finetuning process was conducted with the training data of the KNHANES data set using the fully developed model used in this study. Training was stopped when the performance of the model was best in the KNHANES validation set. When the performance was evaluated on the KNHANES testing set, the finetuned model performed better, with a partial RMSE of 663.6 counts, partial MAE of 391.2 counts, whereas those of the baseline model were 672.1 counts, 396.3 counts, respectively.
There are some limitations to be discussed. First, activity patterns can depend on demographic factors such as age, BMI, and other metrics. Although demographic information may help improve the performance of the imputation model, it was not used in this study because of a lack of data. If we can obtain sufficient data sets with demographic information in future, we will be able to improve the performance of our imputation model by including it in the training; however, because the demographic information of the user is often unknown in actual studies, it could be more practical for the model to restore missing data using only activity data. Second, some activities that cause participants to remove the device cannot be collected in data set, and these data are difficult to impute correctly. This limitation is always carefully considered before imputation methods are applied. Although all imputation methods have this limitation, our model performed better than other methods did. Third, although many imputation methods exist, we only compared our model with three methods. We conducted Gaussian process regression, but it predicted only zero values for the imputed data (
To our knowledge, this is the first study to develop a deeplearning model for imputing missing values in actigraphy data. The results of this study suggest that the deep learning approach is useful for imputing missing values in activity data. We expect that our model will contribute to studies of human activity by decreasing the amount of discarded data due to missing values.
Detailed information about accelerometer device.
The distribution of length of consecutive zeros in each dataset.
Detailed result of 10fold cross validation.
Confidence intervals of evaluation measurement by the bootstrapping method.
Experimental results for 90 and 180min missing intervals for each imputation method.
Example imputed result with 90 and 180min missing intervals.
Comparison of model performance between ZIDCAE model and the naïve convolutional autoencoder with sigmoid function.
Accuracy of restoring ModeratetoVigorous Physical Activity (MVPA).
The result of Gaussian process imputation.
body mass index
Korean Chronic Cerebrovascular Disease Oriented Biobank
Korea National Health and Nutrition Examination Survey
mean absolute error
National Health and Nutrition Examination Survey
root mean square error
This study was conducted with biospecimens and data from the biobank of the Chronic Cerebrovascular Disease Consortium. The consortium was supported and funded by the Korea Centers for Disease Control and Prevention (#4845303). This work was also supported by the faculty research fund of Ajou University School of Medicine and grants from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute funded by the Ministry of Health & Welfare, Republic of Korea (Governmentwide R&D Fund project for infectious disease research HG18C0067).
None declared.