This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR mhealth and uhealth, is properly cited. The complete bibliographic information, a link to the original publication on http://mhealth.jmir.org/, as well as this copyright and license information must be included.
Within the new digital health landscape, the rise of health apps creates novel prospects for health promotion. The market is saturated with apps that aim to increase physical activity (PA). Despite the wide distribution and popularity of PA apps, there are limited data on their effectiveness, user experience, and safety of personal data.
The purpose of this review and content analysis was to evaluate the quality of the most popular PA apps on the market using health care quality indicators.
The top-ranked 400 free and paid apps from iTunes and Google Play stores were screened. Apps were included if the primary behavior targeted was PA, targeted users were adults, and the apps had stand-alone functionality. The apps were downloaded on mobile phones and assessed by 2 reviewers against the following quality assessment criteria: (1) users’ data privacy and security, (2) presence of behavior change techniques (BCTs) and quality of the development and evaluation processes, and (3) user ratings and usability.
Out of 400 apps, 156 met the inclusion criteria, of which 65 apps were randomly selected to be downloaded and assessed. Almost 30% apps (19/65) did not have privacy policy. Every app contained at least one BCT, with an average number of 7 and a maximum of 13 BCTs. All but one app had commercial affiliation, 12 consulted an expert, and none reported involving users in the app development. Only 12 of 65 apps had a peer-reviewed study connected to the app. User ratings were high, with only a quarter of the ratings falling below 4 stars. The median usability score was excellent—86.3 out of 100.
Despite the popularity of PA apps available on the commercial market, there were substantial shortcomings in the areas of data safety and likelihood of effectiveness of the apps assessed. The limited quality of the apps may represent a missed opportunity for PA promotion.
Physical inactivity is an established independent risk factor for a range of serious health conditions including cardiovascular disease, diabetes mellitus, and cancer [
Within the new digital health care landscape, the rise of apps creates novel prospects for prevention opportunities and disease management [
The mHealth app industry has doubled in the last 2 years, with around 165,000 health apps available in the major app stores in 2016 [
However, quality is about more than effectiveness, although there has been considerable debate about how exactly
Health apps have the potential to be an important health care tool [
The dimensions of quality proposed by Maxwell and Donabedian were developed before the existence of mobile phones and apps and are perhaps more applicable to health care services provided at the point of need, that is, face-to-face. Potential new health care tools apps need a more concise approach, one that
In this study, we focused on the most popular apps, which we defined as being in the top rankings of the two major app stores. What constitutes the algorithm that determines the app ranking is unknown. However, variables that indicate popularity such as user ratings, volume of ratings and reviews, download and install counts, usage, and uninstalls are likely to contribute to the ranking in the app stores [
The aim of this study was to assess the quality of publicly available PA apps. Specific objectives were to assess the safety, effectiveness, and provision of the most positive experience in the most popular PA apps.
This study is a review and a content analysis of the most popular, publicly available PA apps on the market.
Apps were included if
Their main goal was to increase physical activity
They were targeted at healthy adults
They had stand-alone functionality
Apps were excluded if
The app focused on multiple behaviors, as it would have been difficult to isolate the content pertaining to physical activity
The target population was patients with a specific health condition, as these users were likely to have different needs to healthy adults
They were sold as part of a pack (“bundle”), as it would not have been possible to assess the popularity of the individual apps in this bundle
A sample of top-ranked 400 PA apps was obtained from the UK’s versions of the iTunes and Google Play stores on October 17, 2016. As previous research indicated an association between price and inclusion of BCTs [
From the apps identified, 65 were randomly selected for the assessment using the random number generator function in Excel (Microsoft). As the largest subset of health apps on the market (30%) [
The apps were downloaded onto an iPhone SE and 6 (running iPhone operating system [iOS, Apple Inc] 10.2.1 and 9.3.4 software, respectively) and Android Samsung Galaxy S6 and J5 (running 6.0.1 or 5.1.1 software, respectively) and assessed using a pro forma evaluation. Each app was left running in the background for 2 days for the assessors to explore any reminders or notifications. If two apps were identified as duplicates and there appeared to be consistency of design and content between both operating systems, the apps were assessed on an iPhone only. The sample identification and assessment was conducted independently by two reviewers (PB and GA), and any discrepancies were resolved through discussion.
We extracted the following descriptive data from both app stores: app’s name, brief description, type of PA targeted (eg, running, walking, and whole body workout), platform on which the app was available, developer’s name, rank, number of ratings, cost, size, last update, and version.
The methods of operationalizing the three quality indicators of safety, effectiveness, and provision of the most positive experience possible for the selected apps is described below.
For the safety indicator of health apps, privacy and security of users’ data were considered. The privacy and security assessment was based on the recommendations of the Information Commissioners Office [
As research on PA app efficacy is lacking, the likelihood of effectiveness was assessed by quantifying the presence of BCTs. Furthermore, many quality assessment procedures include an evaluation of the intervention development processes [
The BCT taxonomy v1 [
The application of the health care quality indicators to physical activity apps.
Quality indicator of health care | Applying the indicator to health apps |
Safety | Privacy and security of data |
Effectiveness | Behavior change techniques (Michie et al [ |
Development and evaluation process: Organizational affiliation; Expert involvement; User involvement; and Evidence of scientific evaluation | |
Positive experience | User ratings |
Usability |
The evaluation of the quality of development process was based on the information provided in the app stores, the app website (if existent), and within the app itself. The following characteristics of the app content development were extracted: organizational affiliation (university, medical, government, or other nonprofit institutions); expert involvement (eg, fitness expert, behavior change specialist, and medical professional); and evidence for user involvement in the development of an app. The evidence for app evaluation was assessed by searching the name of the app in the following scientific databases: PubMed, ACM Digital Library, IEEE Xplore, and Google Scholar.
The provision of the most positive experience was operationalized using (1) the user ratings in app stores and (2) through formal usability assessment conducted by the two reviewers using the System Usability Scale (SUS) [
The application of health care quality indicators to apps is summarized in
Interrater reliability for the presence or absence of the BCTs was ascertained by calculating Cohen kappa statistic [
The number of BCTs in the apps was summarized using the mean, standard deviation, median, 25th and 75th percentiles, and the maximum and minimum. Similar statistics were used to summarize user ratings, cost, size, and SUS score. Proportions were used to summarize the variables: data privacy and security, organization affiliation, expert and user involvement, and the evidence of evaluation in peer-reviewed journals.
The summary descriptive tables were presented for each store for free and paid apps separately and in total as app stores have separate rankings based on the cost. To assess if there was a difference in store characteristics between free and paid apps,
Out of 400 apps, 244 apps were excluded (209 apps did not target PA, 22 apps needed a peripheral device or paid membership to use the app, and 13 apps focused on multiple health behaviors), and 156 met the inclusion criteria (see
Descriptive data for the app sample are presented in
Flowchart of the apps included in the analysis. PA: physical activity.
Descriptive data for iTunes store.
Descriptive data for iTunes | Free—iTunes (N=21) | Paid—iTunes (N=24) | Total—iTunes (N=45) | ||
Mean (SD) | 3408.4 (5848.4) | 773.7 (1187.0) | 2031.2 (4289.7) | .49 | |
Median | 758 | 127 | 550 | ||
25-75 percentile | 438.0-3698.0 | 47.0-1247.0 | 85.5-1719.0 | ||
Min-max | 14-24530 | 11-3845 | 11-24530 | ||
Mean (SD) | N/Ab | 2.5 (1.5) | N/A | ||
Median | N/A | 2.3 | N/A | ||
25-75 percentile | N/A | 1.5-3.0 | N/A | ||
Min-max | N/A | 1-8 | N/A | ||
Mean (SD) | 88.4 (49.8) | 94.9 (75.4) | 91.8 (64.1) | .74 | |
Median | 74.3 | 83.3 | 82.2 | ||
25-75 percentile | 52.0-131.0 | 61.7-102.0 | 58.1-104.0 | ||
Min-max | 11-164 | 9-376 | 9-376 | ||
<3 months, n (%) | 13 (61.9) | 7 (29.2) | 20 (44.4) | .09 | |
3-6 months, n (%) | 3 (14.3) | 7 (29.2) | 10 (22.2) |
aGBP: British pound.
bN/A: not applicable.
Descriptive data for Google Play store.
Descriptive data for Google Play | Free—Google Play (N=21) | Paid—Google Play (N=16) | Total—Google Play (N=37) | ||
Mean (SD) | 119000.7 (165085.0) | 14457.9 (43700.8) | 73793.0 (136723.2) | >.99 | |
Median | 44923 | 1720.5 | 5856 | ||
25-75 percentile | 5827.0-199596.0 | 384.5-6452.0 | 1475.0-78204.0 | ||
Min-max | 206-625077 | 7-177277 | 7-625077 | ||
Mean (SD) | N/Ab | 3.6 (2.3) | N/A | ||
Median | N/A | 2.7 | N/A | ||
25-75 percentile | N/A | 2.3-5.0 | N/A | ||
Min-max | N/A | 1-9 | N/A | ||
Mean (SD) | 28.4 (21.2) | 43.4 (34.2) | 34.9 (28.2) | .11 | |
Median | 26.8 | 31.5 | 29.6 | ||
25-75 percentile | 12.2-38.5 | 27.7-54.0 | 15.4-43.9 | ||
Min-max | 2-73 | 1-145 | 1-145 | ||
<3 months, n (%) | 16 (76) | 7 (44) | 23 (62) | .12 | |
3-6 months, n (%) | 1 (5) | 3 (19) | 4 (11) | ||
>6 months, n (%) | 4 (19) | 6 (38) | 10 (27) |
aGBP: British pound.
bN/A: not applicable.
The apps were categorized into five groups according to their primary focus. These were as follows: workout apps that demonstrate various exercises (31/65, 47%), tracking of movement apps that provide mapping of the running or walking or cycling routes (13/65, 20%), running programs that have prespecified goals reached by incremental increase in run-to-walk ratio (12/65, 18%), pedometers-based apps that count steps (6/65, 9%), and interval timers that enable the user to time their work or rest period (3/65, 4%).
The privacy policy was available for 46 (70%, 46/65) apps overall. In one case, the link to the privacy policy was provided but did not work, and the app was indicated as not having a privacy policy. Of those that had privacy policy, only 4 (8%, 4/46) apps had a short form privacy and security notice that highlighted key data practices that were disclosed in detail in the full privacy policy (see
Most of the apps (80%) reported collecting personally identifiable information. In one instance, the developer did not discuss the data gathering practices. In 34 instances (80%, 34/46), the developers stated that they share the data they gather with 3rd parties. There were two instances where the developer did not discuss data sharing practices. In many cases, the policies stated that “data shall not be shared, except for” followed by a list of exceptions that were vague and general. In these instances, the reviewers considered that the data were shared by the 3rd party.
Only 41% (19/46) of the apps described how the users’ data were protected. The privacy policies stated that data safety is important to their practices but did not provide information on how data security was ensured.
There was “almost perfect” agreement between the reviewers for the coding of BCT presence or absence: PABAK=0.94, 95% CI 0.93-0.95, kappa=.78 (“substantial”), 95% CI 0.75-0.81.
Data gathering, sharing and security as described in the privacy policy (within those that had the policy, N=46). Note: 29% (19/65) did not have a privacy policy available.
Data gathering, sharing, and security as described in the privacy policy | Free (N=24), n (%) | Paid (N=22), n (%) | Total (N=46), n (%) | |
Yes | 24 (100) | 22 (100) | 46 (100) | |
No | 13 (44) | 16 (55) | 29 (63) | |
Yes | 11 (64) | 6 (35) | 17 (36) | |
No | 17 (70) | 16 (72) | 33 (71) | |
Yes | 4 (16) | 0 (0) | 4 (8) | |
Not applicable | 3 (12) | 6 (27) | 9 (19) | |
No | 20 (83) | 21 (95) | 41 (89) | |
Yes | 4 (16) | 1 (4) | 5 (10) | |
No | 2 (8) | 6 (27) | 8 (17) | |
Yes | 21 (87) | 16 (72) | 37 (80) | |
Not specified | 1 (4) | 0 (0) | 1 (2) | |
No | 2 (8) | 8 (36) | 10 (22) | |
Yes | 21 (87) | 13 (59) | 34 (74) | |
Not specified | 1 (4) | 1 (4) | 2 (4) | |
No | 13 (54) | 14 (63) | 27 (58) | |
Yes | 11 (45) | 8 (36) | 19 (41) |
Descriptive statistics for the inclusion of the behavior change techniques (BCTs).
Inclusion of the BCTs | Free (N=32) | Paid (N=33) | Total (N=65) | ||
Mean (SD) | 6.6 (3.0) | 7.5 (2.9) | 7.0 (2.9) | .21 | |
Median | 7 | 8 | 8 | ||
25-75 percentile | 5.0-8.0 | 6.0-10.0 | 5.0-9.0 | ||
Min-max | 1-12 | 1-13 | 1-13 |
The total number of BCTs for free and paid apps sample was similar (see
Only 1 app had a noncommercial affiliation,
The median user rating in iTunes was 4.4 and 4.5 in Google Play and did not differ between free and paid apps in either stores (see
In both stores, the 25th percentile was around 4 stars (4.0 in iTunes and 4.4 in Google Play), suggesting that the user ratings tended to be high, and only 25% of ratings were below 4 stars. The histograms of star ratings in both stores (
The average SUS score for the apps was similar for both free and paid apps, with median of 86.3 (see
Frequency of behavior change techniques (BCTs) incorporated by physical activity (PA) apps, presented by BCT groups.
Examples of the most common behavior change techniques (BCTs) from the most frequent BCT groups: (1) goals and planning: 1.1 Goal setting (behavior), (2) feedback and monitoring: 2.2 Feedback on behavior, and (3) comparison of behavior: 6.1 Demonstration of the behavior.
Descriptive data for the quality of app development and evaluation process: organizational affiliation, expert and user involvement, and evidence of evaluation in peer-reviewed journals.
The quality of app development and evaluation process | Free (N=32), n (%) | Paid (N=33), n (%) | Total (N=65), n (%) | ||
Commercial | 31 (96) | 33 (100) | 64 (98) | .49 | |
Government institution | 1 (3) | 0 (0) | 1 (1) | ||
No | 28 (87) | 25 (75) | 53 (81) | .34 | |
Yes | 4 (12) | 8 (24) | 12 (18) | ||
No | 32 (100) | 33 (100) | 65 (100) | ||
No | 23 (71) | 30 (90) | 53 (81) | .06 | |
Yes | 9 (28) | 3 (9) | 12 (18) |
Descriptive statistics for user ratings (1-5 stars) in iTunes and Google Play.
User ratings | Free | Paid | Total | ||
(N=21) | (N=24) | (N=45) | |||
Mean (SD) | 4.1 (0.8) | 4.3 (0.6) | 4.2 (0.7) | .22 | |
Median | 4.4 | 4.6 | 4.4 | ||
25-75 percentile | 4.0-4.6 | 4.0-4.8 | 4.0-4.6 | ||
Min-max | 2-5 | 3-5 | 2-5 | ||
(N=21) | (N=16) | (N=37) | |||
Mean (SD) | 4.4 (0.5) | 4.4 (0.3) | 4.4 (0.4) | .90 | |
Median | 4.5 | 4.5 | 4.5 | ||
25-75 percentile | 4.4-4.6 | 4.4-4.6 | 4.4-4.6 | ||
Min-max | 2-5 | 4-5 | 2-5 |
Distribution of user ratings in iTunes and Google Play.
Descriptive data for the System Usability Scale (SUS) assessment.
Usability assessment | Free (N=32) | Paid (N=33) | Total (N=65) | ||
.17 | |||||
Mean (SD) | 81.3 (12.6) | 85.5 (11.9) | 83.4 (12.4) | ||
Median | 85 | 87.5 | 86.3 | ||
25-75 percentile | 71.9-91.3 | 80.0-93.8 | 75.0-92.5 | ||
Min-max | 53-100 | 58-100 | 53-100 |
This study described the most popular PA apps on the market, focusing on the quality determinants of safety (data privacy and security), effectiveness (BCTs and development and evaluation quality), and provision of the most positive experience possible (user ratings and usability). Overall, our findings suggest that most of the apps in this sample were of reasonable quality in terms of the user experience, but there were substantial shortcomings in the areas of safety and effectiveness. The assessment of data privacy and security showed that the privacy policy was not available for 29.2% of the apps. Most apps collected personally identifiable information, shared users’ data with a third party, and more than half of the apps did not specify how they ensure data security. Every app contained at least one BCT, with an average of 7. The maximum number of BCTs was 13, and the most common BCTs related to provision of feedback on behavior. All but one app had commercial affiliation, 12 consulted an expert, and none reported involving users in the app development. Only 12 of 65 apps had a peer-reviewed study connected to the app but only one app was assessed for efficacy in a trial [
The assessment of privacy policy showed that privacy and security of users’ data could be substantially improved. Our results are consistent with previous studies assessing data safety. Huckvale et al [
The apps in the review contained, on average, 7 BCTs. The results of this study are similar to those found in previous reviews of PA apps: Middelweerd et al [
The most common BCTs were feedback and monitoring, goal setting, and action planning. These self-regulation strategies have been shown to be effective in increasing PA behavior [
The effect of the number of BCTs on efficacy of the interventions remains inconclusive. Although there is some evidence that higher number of BCTs produces larger effect sizes in Web-based interventions [
The use of evidence and theoretical frameworks is vital in developing behavior change interventions [
The results suggest that the quality of the app development and evaluation process could be improved. We did not find any evidence of user involvement, and most apps were commercially developed with the rare involvement of experts. Similar results were found in previous reviews [
The usability of the apps reviewed was high. Likewise, user ratings of the PA apps were high, with only a quarter of the ratings receiving less than 4 stars. Similarly, Mendiola et al [
The strengths of this study include a systematic approach to sample identification and assessment. First, the sample of apps was identified by screening 400 apps in two major app distribution platforms, including both free and paid apps. Second, the sample was identified and assessed by 2 independent reviewers. Third, the assessment tools covered various aspects of quality, both inclusion of theory as well as user experience using subjective (user ratings) and objective (usability) measures.
First, it is unknown what variables are included in the ranking algorithm of the top apps from which the sample was selected. It is likely that usage data and user ratings comprise the ranking [
More studies are needed to assess what predicts higher user rating. It is unknown what features or characteristics of apps users like and perceive to be effective in increasing their PA. It is possible that there is a discrepancy between what is liked and what is more likely to be effective. Second, research is needed to understand the use of PA apps to design effective digital tools. There is little knowledge concerning how users adopt these apps into their routines and what are the facilitators and barriers to increasing PA using apps. Third, the optimal number of BCTs in PA app remains unknown. It is likely that different BCTs may be more suitable for different modes of delivery (face-to-face, Web-based, and app), For example, social support might produce better results when delivered face-to-face rather than via an app. Alternatively, automatic monitoring and feedback on PA in apps can facilitate self-regulation and may be considered as a more efficient method than self-monitoring using diaries.
Although popularity of the apps is high, health care professionals and potential users need to be aware of the limitation in the safety of personal data, as well as the limitation in the quality of the apps to change behavior. Currently, it is not possible to recommend apps that are most effective, but attempts to create a database of high-quality apps are in progress. For example, the National Information Board is developing an app accreditation model that consists of a 4-stage assessment framework that aims to establish a database of high-quality health apps [
This study examined the quality of the most popular PA apps currently available on the market. Although usability and user ratings of app were high, there was a concerning lack of safety controls for users’ personal data for the majority of the apps, the apps included limited number of BCTs that mostly related to feedback on behavior, and the quality of the content and development processes were suboptimal. The technological development and the potential for profit far outpaced the research on the ability of these apps to support PA behavior change. With 165,000 apps on the market, this represents a loss of opportunity for health promotion on a large scale.
Data privacy and security assessment based in the content of privacy policy.
Individual-level data for the sample of apps assessed.
Graph of the distribution of the BCTs in PA apps.
Frequency of individual BCTs within the groups BCTs (BCTs that occurred in at least five apps are shown).
Graph of the distribution of the SUS score averaged between the two reviewers.
behavior change technique
mobile health
National Health Service
physical activity
prevalence-adjusted bias-adjusted kappa
randomized controlled trial
System Usability Scale
The authors would like to thank Lou Atkins, Senior Teaching Fellow, Department of Clinical, Educational and Health Psychology, University College London, London, UK, who was consulted on the BCT coding. PK is a PhD student at University College London, funded by the Medical Research Council.
None declared.