Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?


Currently submitted to: JMIR mHealth and uHealth

Date Submitted: Sep 6, 2020
Open Peer Review Period: Sep 7, 2020 - Nov 7, 2020
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Accelerometry for the prediction of Type-2 Diabetes: A machine learning-based study of the UK Biobank accelerometer cohort

  • Benjamin Lam; 
  • Paolo Missier; 
  • Michael Catt; 
  • Sophie Cassidy; 



Between 2013 and 2015, the UK Biobank (UKBB) collected accelerometer traces (AXT) using wrist-worn triaxial accelerometers for 103,712 volunteers aged between 40 and 69, for one week each. This dataset has been used in the past to verify that individuals with chronic diseases exhibit reduced activity levels compared to healthy populations 1. Yet, the dataset is likely to be noisy, as the devices were allocated to participants without a specific set of inclusion criteria, and the traces reflect uncontrolled free-living conditions.


To determine the extent to which AXT traces can distinguish individuals with Type-2 Diabetes (T2D) from normoglycaemic controls, and to quantify their limitations.


Physical activity features were first extracted from the raw AXT dataset for each participant, using an algorithm that extends the previously developed Biobank Accelerometry Analysis toolkit from Oxford University 1. These features were complemented by a selected collection of socio-demographic and lifestyle (SDL) features available from UKBB. Clustering was used to determine whether activity features would naturally partition participants, and the SDL features were projected onto the resulting clusters for a more meaningful interpretation. Supervised machine learning classifiers were then trained using the different sets of features, to segregate T2D positive individuals from normoglycaemic. Multiple criteria, based on a combination of self-assessment Biobank variables and primary care health records linked to the participants in Biobank, were used to identify 3,103 individuals in this population who have T2D. The remaining non-diabetic participants were further scored on their physical activity impairment severity levels based on other conditions found in their primary care data, and those likely to have been physically impaired at the time were excluded.


Three types of classifiers were tested, with AUROC close to .86 for all three, and F1 scores in the range [.80,.82] for T2D positives and [.73,.74] for controls. Results obtained using non-physically impaired controls were compared to highly physically impaired controls, to test the hypothesis that non-diabetes conditions reduce classifier performance. Models built using a training set that includes controls with other conditions had worse performance: AUROC [.75-.77] and F1 in the range [.76-.77] (positives) and [.63,.65] (controls). Clusters generated using k-means and hierarchical methods showed limited quality (Silhouette scores: 0.105, 0.207 respectively), however a 2-dimensional visual rendering obtained using T-SNE reveals well-defined clusters. Importantly, one of the 3 hierarchical clusters contain almost exclusively (close to 100%) T2D participants.


The study demonstrates the potential, and limitations, of AXT in the UKBB when these are used to discriminate between T2D and normoglycaemic controls. The use of primary care EHRs is essential both to correctly identify positives, and also to identify controls that should be excluded to reduce noise in the training set.


Please cite as:

Lam B, Missier P, Catt M, Cassidy S

Accelerometry for the prediction of Type-2 Diabetes: A machine learning-based study of the UK Biobank accelerometer cohort

JMIR Preprints. 06/09/2020:23364

DOI: 10.2196/preprints.23364


Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.