This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR mHealth and uHealth, is properly cited. The complete bibliographic information, a link to the original publication on http://mhealth.jmir.org/, as well as this copyright and license information must be included.

Expensive optoelectronic systems, considered the gold standard, require a laboratory environment and the attachment of markers, and they are therefore rarely used in everyday clinical practice. Two-dimensional (2D) human pose estimations for clinical purposes allow kinematic analyses to be carried out via a camera-based smartphone app. Since clinical specialists highly depend on the validity of information, there is a need to evaluate the accuracy of 2D pose estimation apps.

The aim of the study was to investigate the accuracy of the 2D pose estimation of a mobility analysis app (Lindera-v2), using the PanopticStudio Toolbox data set as a reference standard. The study aimed to assess the differences in joint angles obtained by 2D video information generated with the Lindera-v2 algorithm and the reference standard. The results can provide an important assessment of the adequacy of the app for clinical use.

To evaluate the accuracy of the Lindera-v2 algorithm, 10 video sequences were analyzed. Accuracy was evaluated by assessing a total of 30,000 data pairs for each joint (10 joints in total), comparing the angle data obtained from the Lindera-v2 algorithm with those of the reference standard. The mean differences of the angles were calculated for each joint, and a comparison was made between the estimated values and the reference standard values. Furthermore, the mean absolute error (MAE), root mean square error, and symmetric mean absolute percentage error of the 2D angles were calculated. Agreement between the 2 measurement methods was calculated using the intraclass correlation coefficient (ICC[A,2]). A cross-correlation was calculated for the time series to verify whether there was a temporal shift in the data.

The mean difference of the Lindera-v2 data in the right hip was the closest to the reference standard, with a mean value difference of –0.05° (SD 6.06°). The greatest difference in comparison with the baseline was found in the neck, with a measurement of –3.07° (SD 6.43°). The MAE of the angle measurement closest to the baseline was observed in the pelvis (1.40°, SD 1.48°). In contrast, the largest MAE was observed in the right shoulder (6.48°, SD 8.43°). The medians of all acquired joints ranged in difference from 0.19° to 3.17° compared with the reference standard. The ICC values ranged from 0.951 (95% CI 0.914-0.969) in the neck to 0.997 (95% CI 0.997-0.997) in the left elbow joint. The cross-correlation showed that the Lindera-v2 algorithm had no temporal lag.

The results of the study indicate that a 2D pose estimation by means of a smartphone app can have excellent agreement compared with a validated reference standard. An assessment of kinematic variables can be performed with the analyzed algorithm, showing only minimal deviations compared with data from a massive multiview system.

Traditional movement assessments, although carried out by experienced physicians, physiotherapists, and occupational therapists, can contain inaccuracies due to subjectivity, despite the clinicians’ expertise. In contrast, quantitative motion measurements by motion capture systems are a valuable tool in scientific and clinical motion analysis and offer a highly accurate and reliable way of capturing kinematic data. Quantitative analyses can be used, for example, to monitor the progress of therapies and objectively evaluate the effectiveness of specific interventions. Motion capture systems are used in sports, biomechanics, and rehabilitation, and they focus on gait analysis, injury prevention, and performance improvement [

There are many optoelectronic motion capture systems based on markers (eg, Vicon [Vicon Motion Systems], Motion Analysis [Motion Analysis Corp], Optitrack [NaturalPoint Inc], and Qualisys [Qualisys AB]). These systems are often regarded in the literature as the gold standard for motion capture [

Inertial sensor measurement systems can be used as a low-cost alternative. However, an inertial sensor measurement system cannot determine global position when used as a stand-alone system (by itself), although as a fusion system, in combination with a rigid body model such as the Perception Neuron (Noitom Ltd), a position in space can still be identified [

A 2D skeleton detector enables calculation of specific joint angles for assessment and feedback in sport and rehabilitation settings. The use of 2D human pose estimations for clinical purposes, such as the Lindera-v2 app or the motion-tracking coach on the Kaia health app [

For the accuracy evaluation, 10 video sequences were generated from Panoptic Studio 3D PointCloud (data set 171204_pose1-6) [

The Lindera-v2 algorithm is a combination of a 2D and 3D skeleton-based pose estimation. For this study, we needed the output of the 2D skeleton estimator to calculate 2D joint angles [

Two-dimensional pose estimation by skeleton fitting, based on 25 body joint coordinates.

The Panoptic Studio data set from Carnegie Mellon University [

The 2D skeleton of the 2D Panoptic Studio pose detector has 15 anatomical landmarks. The 2D detector uses appearance information in the interpretation and includes connectivity information.

The value tables for the respective joint angles were clustered, and missing values were imputed using a simple moving average. The mean difference (bias) between the Lindera-v2 algorithm estimates and the reference standard values was calculated for each joint. Furthermore, the mean absolute error, the root mean squared error, and the symmetric mean absolute percent error of the 2D angles were used. The intraclass correlation coefficient (ICC[A,2]) was calculated for the data using the 2-way mixed-effects model as a measure of agreement between the 2 measurement methods. An ICC in the range of 0 indicates random evaluation behavior, and a value of 1 is regarded as an ideal reliable feature evaluation by the evaluators. We used the definition in which values greater than 0.7 are generally regarded as indicators of good agreement [

A cross-correlation was calculated for the time series to verify whether there was a temporal shift in the data. To verify the stationarity of the data, which is a prerequisite for cross-correlation testing, we used the augmented Dickey-Fuller test. The data were first evaluated in IBM SPSS Statistics (version 25.0; IBM Corp) and then in the programming language R in RStudio (version 3.5.1; RStudio Inc).

In order to evaluate the accuracy of the movement signals recorded, we analyzed a total of 30,000 data pairs for each joint, comparing the joint angles obtained using the Lindera-v2 algorithm with those of the PanopticStudio Toolbox data set (the reference standard).

Mean angle difference and ICC of Lindera-v2 and the Panoptic Studio data set for the joints analyzed.

Joint | 2D^{a} key points used |
Difference in 2D angles (°), mean (SD); 95% CI | MAE^{b} of 2D angles (°) |
MAD^{c} (°) |
RMSE^{d} of 2D angles (°) |
sMAPE^{e} (%) |
ICC^{f} (95% CI) |
SE of mean difference |

Right shoulder | Right hip, shoulder, and elbow | 2.71 (10.28); |
6.48 | 4.10 | 10.63 | 23.33 | 0.978 |
0.06 |

Left shoulder | Left hip, shoulder, and elbow | –0.07 (12.11); |
3.98 | 3.20 | 12.12 | 10.71 | 0.951 |
0.07 |

Right elbow | Right shoulder, elbow, and wrist | –1.01 (12.12); |
6.18 | 4.30 | 12.16 | 6.64 | 0.983 |
0.07 |

Left elbow | Left shoulder, elbow, and wrist | 0.24 (6.20); |
3.15 | 2.84 | 6.21 | 9.17 | 0.997 |
0.04 |

Right hip | Right shoulder, hip, and knee | –0.05 (6.06); |
4.45 | 4.68 | 6.06 | 3.01 | 0.983 |
0.04 |

Left hip | Left shoulder, hip, and knee | –0.61 (3.85); |
2.29 | 2.29 | 3.90 | 1.74 | 0.992 |
0.02 |

Right knee | Right hip, knee, and ankle | –1.37 (2.97); |
2.58 | 2.93 | 3.27 | 1.56 | 0.985 |
0.02 |

Left knee | Left hip, knee, and ankle | 0.84 (4.31); |
2.28 | 2.45 | 4.44 | 1.39 | 0.971 |
0.03 |

Neck | Pelvis, neck, and head | –3.07 (6.43); |
4.47 | 3.63 | 7.13 | 3.20 | 0.951 |
0.04 |

Pelvis | Left knee, pelvis, and right knee | 0.15 (2.03); |
1.40 | 1.64 | 2.04 | 5.42 | 0.996 |
0.01 |

^{a}2D: two-dimensional.

^{b}MAE: mean absolute error.

^{c}MAD: mean absolute deviation.

^{d}RMSE: root mean square error.

^{e}sMAPE: symmetric mean absolute percentage error.

^{f}ICC: intraclass correlation coefficient ICC(A,2).

The data collected indicated both a negative and a positive bias. The mean difference of the joint angles that was nearest to the baseline was identified in the right hip (–0.05°, SD 6.06°). The joint with the highest mean difference (ie, with the greatest difference from 0) was the neck (–3.07°, SD 6.43°). The mean joint angle accuracy was used to show the average magnitude of the errors. The mean absolute error of the angle measurement closest to the baseline was observed in the pelvis (1.40°, SD 1.48°). In contrast, the highest mean absolute error was observed in the right shoulder (6.48°, SD 8.43°). The standard deviation was also lowest in the pelvis (SD 3.36°), and the highest standard deviation was found to be in the left shoulder (SD 11.45°). The root mean square error was also applied, although this tends to give weight to large errors. The RMSE indicated low accuracy in the right elbow (12.16°) and high accuracy in the pelvis (2.04°). Since the mean absolute percentage error cannot be used when values are 0 (as this would result in division by 0), we used the sMAPE, which was lowest in the left knee (1.39%) and highest in the right shoulder (23.33%).

The intraclass correlation coefficient for the joint angles is also shown in

Interpretation of the measurement values based on mean values can lead to biased findings (eg, in the case of extreme outliers). Since the median is less affected by outliers, we used box plots for the differences in joint angle values measured with the Lindera-v2 and the reference standard.

Box plot showing differences in the Lindera-v2 and reference standard values, measured across all joints tested.

Box plot with outliers showing differences in the Lindera-v2 and reference standard values, measured across all joints tested.

To examine the ICC more closely and analyze the potential influence of single videos on the ICC values of the joints, our next step entailed calculating an ICC for each video.

The cross-correlation function was applied to the selected time series in order to examine the temporal lag. The results in

Dot plot of the intraclass correlation coefficient comparing Lindera-v2 and reference standard for the 10 single videos used for accuracy measurement in each joint.

Cross-correlation graph of Lindera-v2 and Panoptic Studio data set values. One lag represents 1 sample (frame). ACF: autocorrelation function.

Maximum cross-correlations of Lindera-v2 and reference standard values.

Joint | Lag value with maximum correlation | Maximum correlation coefficient |

Right shoulder | 0 | 0.96 |

Left shoulder | 0 | 0.91 |

Right elbow | 0 | 0.97 |

Left elbow | 0 | 0.99 |

Right hip | 0 | 0.97 |

Left hip | 0 | 0.99 |

Right knee | 0 | 0.98 |

Left knee | 0 | 0.95 |

Neck | 0 | 0.92 |

Pelvis | 0 | 0.99 |

The goal of this study was to validate the accuracy of the 2D pose estimation of joint angles obtained from the Lindera-v2 algorithm, using the PanopticStudio Toolbox, which served as the reference standard. Therefore, we analyzed 30,000 data pairs for each joint angle during diverse total body motion activity. First, the mean difference and error measures were compared for each joint. Second, the ICC was compared for each joint. In order to verify agreement between the 2 measurement methods (the Lindera-v2 and the PanopticStudio Toolbox data set), we analyzed the ICC values for each of the 10 videos. Finally, we examined the potential temporal lag through cross-correlation. The results of the study indicate that the 2D pose estimation method used had excellent agreement with the reference standard. Furthermore, the Lindera-v2 algorithm had no temporal lag.

The mean angle generated for the right hip by the Lindera-v2 algorithm was the closest to the reference standard. Even the value with the greatest difference from 0 (found in the neck) was acceptable. However, these values should be treated with caution because mean values can lead to biased results. Therefore, we displayed the median values in box plots. The medians of all joints compared with the reference standard ranged from a difference of 0.19° (pelvis joint) to 3.17° (right shoulder). In all joints, the IQR was within 6° and –6°, which means that 50% of the values were within this range. These acquired values provide a promising starting point upon which to base mobility assessments and 3D pose estimation. A further reason why box plots were used was to identify outliers because the RMSE used gives greater weight to large errors.

The ICC agreement between the 2 measurement methods can be interpreted as excellent (according to the classification presented by Fleiss [

Early research and reviews published in 2016 reported that the Kinect skeleton-tracking algorithm indicated poor validity and large errors with respect to most kinematic variables [

Valid and reliable 2D joint angles are an important first step on the way to valid and reliable 3D joint angles. Therefore, in the next step, the 2D data from the evaluation will be transformed into 3D pose estimation angles using deep convolutional neural networks. A validation of the 3D joint angle accuracy of the resulting data will show whether the requirements for clinical practice are met.

Although this study showed excellent agreement to a reference standard, a validity study using a state-of-the-art marker-based motion capture system as a ground truth is necessary for a thorough validation. The comparison to the reference standard is an important step toward accuracy assurance but does not replace a proof of validity.

To determine a systematic error in the algorithm by an offset, a static setup would be needed. From this, a Euclidean distance could be calculated to identify a precise source of error. The mean joint position error is the most frequently used method for verification of the accuracy of a pose estimation. However, since determination of the coordinates in millimeters in space was not possible in these data sets, accuracy verification was carried out for the joint angles. Verification of the precision showing the repeatability of the data was not planned in this project, since measurement using the Lindera-v2 was carried out once and the movements were not repeated in a standardized manner. However, the precision of the time stamps within the measurement of the evaluated movements can be seen from the standard errors of the mean difference. A validation of the precision will be the subject of further studies.

In geriatrics, orthopedics, and neurology in particular, accurate and validated mobility analyses such as the Lindera-v2 could help medical professionals confirm diagnoses and track the success of treatments. Mobility assessments have very high relevance for a multitude of clinical uses (eg, older adults and patients with more severe diseases who have a higher risk of falling) [

The results of the study indicate that 2D pose estimation by means of a camera-based smartphone app can have excellent agreement with a validated reference standard. Furthermore, the Lindera-v2 algorithm was found to have no temporal lag. An assessment of kinematic variables, such as specific joint angles, can be performed with the algorithm, and these data showed only minimal deviations compared with data from a massive multiview system. In future studies, it will be important to test the app in a clinical context with participants with physical limitations.

two-dimensional

three-dimensional

frames per second

high definition

intraclass correlation coefficient

mean absolute error

root mean square error

symmetric mean absolute percentage error

video graphics array

We would like to thank Lindera GmbH for providing the raw data for the study. We acknowledge support from the German Research Foundation and the Open Access Publication Fund of Charité – Universitätsmedizin Berlin.

None declared.