Quality of Physical Activity Apps: Systematic Search in App Stores and Content Analysis

Background: Physical inactivity is a major contributor to the development and persistence of chronic diseases. Mobile health apps that foster physical activity have the potential to assist in behavior change. However, the quality of the mobile health apps available in app stores is hard to assess for making informed decisions by end users and health care providers. Objective: This study aimed at systematically reviewing and analyzing the content and quality of physical activity apps available in the 2 major app stores (Google Play and App Store) by using the German version of the Mobile App Rating Scale (MARS-G). Moreover, the privacy and security measures were assessed. Methods: A web crawler was used to systematically search for apps promoting physical activity in the Google Play store and App Store. Two independent raters used the MARS-G to assess app quality. Further, app characteristics, content and functions, and privacy and security measures were assessed. The correlation between user star ratings and MARS was calculated. Exploratory regression analysis was conducted to determine relevant predictors for the overall quality of physical activity apps. Results: Of the 2231 identified apps, 312 met the inclusion criteria. The results indicated that the overall quality was moderate (mean 3.60 [SD 0.59], range 1-4.75). The scores of the subscales, that is, information (mean 3.24 [SD 0.56], range 1.17-4.4), engagement (mean 3.19 [SD 0.82], range 1.2-5), aesthetics (mean 3.65 [SD 0.79], range 1-5), and functionality (mean 4.35 [SD 0.58], range 1.88-5) were obtained. An efficacy study could not be identified for any of the included apps. The features of data security and privacy were mainly not applied. Average user ratings showed significant small correlations with the MARS ratings (r=0.22, 95% CI 0.08-0.35; P<.001). The amount of content and number of functions were predictive of the overall quality of these physical activity apps, whereas app store and price were not. Conclusions: Apps for physical activity showed a broad range of quality ratings, with moderate overall quality ratings. Given the present privacy, security, and evidence concerns inherent to most rated apps, their medical use is questionable. There is a need for open-source databases of expert quality ratings to foster informed health care decisions by users and health care providers. JMIR Mhealth Uhealth 2021 | vol. 9 | iss. 6 | e22587 | p. 1 https://mhealth.jmir.org/2021/6/e22587 (page number not for citation purposes) Paganini et al JMIR MHEALTH AND UHEALTH


Introduction
Physical inactivity is a significant risk factor for noncommunicable diseases such as cancer, diabetes, cardiovascular diseases, or chronic respiratory diseases and is estimated to cause 6%-10% of these diseases worldwide [1,2]. Insufficient physical activity is also a leading risk factor for mortality and was reported to be associated with 9% of premature death cases in 2008 [2]. The World Health Organization recommends at least 150 minutes of moderate or 75 minutes of vigorous-intensity physical activity per week for adults [3]. However, about 30% of adults do not follow this recommendation and are physically inactive [4].
Evidence indicates that regular physical activity results in physical, social, and mental health benefits such as better quality of sleep, lower depressive symptomatology, higher well-being, and a reduced risk of a large number of noncommunicable diseases [5][6][7]. Mobile apps might be a cost-effective and scalable option to foster behavior change in daily life [8]. Apps can also be beneficial as a supplement to behavioral interventions [9]. Additionally, fitness apps are very popular in the general population; a survey conducted in the United States in 2015 showed that about 58% of mobile phone users had downloaded a health app [10]. Of these, the most common categories were fitness and nutrition apps, and most respondents were using them daily. Moreover, health app users were more likely to meet the World Health Organization recommendations concerning physical activity [3,11] and apps were found to be efficacious in promoting physical activity with moderate effect sizes [12]. In 2 recent meta-analyses of apps for increasing physical activity, there was an increase in objectively measured physical activity in the app groups compared to that in the control groups [13,14]. However, these differences were not significant. Regarding the content and quality of apps promoting physical activity, previous reviews focused mainly on the use of behavioral change techniques (BCTs) developed by Abraham and Michie [15][16][17][18][19][20][21]. The most often provided BCTs were feedback on performance, self-monitoring, and goal setting [15,16,18,19].
Regarding the quality of apps, Schoeppe and colleagues [22] used the standardized Mobile App Rating Scale (MARS) [23] to evaluate apps for improving diet, physical activity, and sedentary behavior. However, the mentioned reviews show some limitations as these reviews mostly evaluated apps with specific characteristics (eg, only apps that are connected to an electronic activity monitor) [18], apps especially developed for children and adolescents [20,22], a limited number of apps (eg, only the 20 top-ranked apps, random selection of apps, or apps with a star-rating of at least 4) [16,19,22], or apps with certain contents and features (eg, only apps with feedback and apps that follow the official World Health Organization recommendation for physical activity) [15]. Overall, most studies evaluating apps for health behavior change use self-developed evaluation checklists and do not assess privacy and security features [24]. The only validated evaluation tools that were used are the BCT taxonomy and, in a few studies, the MARS [15,16,18,22,24].
Besides the evaluation of theory-based content, applied techniques or functions, and effectiveness, it is important to consider the risks of mobile health app use, such as inadequate protection of data and privacy or lack of informed consent [24,25]. Many physical activity apps are available, particularly via the 2 largest app stores, Google Play store and the App Store but there is only limited information on the quality and data security of these apps [24,26]. Initial studies concerning data security of medicine-related, depression, and smoking cessation apps reveal worrying results as sharing medical health data with third parties is routine and mostly not made transparent [27,28]. The only study evaluating the safety of personal data in physical activity apps also revealed substantial shortcomings [19]. In terms of quality, user ratings seem to be a questionable indicator as they seem to be mostly influenced by usability and functionality [26,29]. However, a recent evaluation revealed a positive correlation between a broad range of app quality ratings and user star ratings [30].
The aim of this study was to conduct a systematic and objective investigation of the physical activity apps available in 2 major app stores by using the German version of MARS (MARS-G). The MARS-G is a multidimensional instrument specifically developed to assess app quality on the dimensions engagement, functionality, aesthetics, and information quality [31]. Furthermore, privacy and security measures as well as the general characteristics and functions of physical activity apps will be assessed. The following research questions are addressed: 1. What is the quality of the apps promoting physical activity regarding engagement, functionality, aesthetics, and information? 2. What are the general characteristics, content, functions, privacy, and security measures of the apps promoting physical activity? 3. Are the user ratings in agreement with the expert quality ratings? 4. Which app features can predict app quality?

Search Strategy and Eligibility Criteria
A web crawler (automated web search engine) was used to scan the European Google Play store and App Store to search for eligible apps. The search was carried out on February 20, 2018 by using the following search terms: (1) active, (2) endurance, (3) exercise, (4) fitness, (5) gymnastics, (6) muscle, (7) shape, (8) strength, (9) training, and (10) workout. The search string to identify apps targeting physical activity was developed in an expert discussion (EMM and HB). The web crawler searches for each term and app store. Duplicates were automatically removed. After identification, a two-step procedure was applied by 2 independent researchers: (1) checking eligibility based on app title and description and (2) checking eligibility based on information in the downloaded app. In the first step, all identified apps were screened for whether their title, description, and images indicated that the app was developed for promoting physical activity (with at least 50% of the content focusing on physical activity); the app was available in German or English; the app was downloadable through the official Google Play store or App Store; the app could be used without further equipment, devices, or programs; and the app was primarily developed for adults. In the second step, all downloaded apps were assessed in detail to check whether they met the abovementioned eligibility criteria. If apps did not work after the download (checked with 2 different mobile phones) or were explicitly developed for children (explicitly stated in the title, description, or aims of the app), they were excluded. The other exclusion criteria were (1) app bundle, (2) only working with additional device (eg, Garmin connect), or (3) targeting specific person groups (eg, employees of a specific company).

Rating Procedure
Each app was rated by 2 independent raters between February and October 2018. All raters undertook a free online training [32] (training module last updated on November 25, 2019). Raters were recruited from an interdisciplinary expert team (sports science, sport psychology, clinical psychology, information technology: EMM, YT, SP, LS, JL, SC, SB, LK, AK, DS, SS, KP, RP, and RW). Each app was tested and used for at least 15-20 minutes before the rating. The interrater reliability between the raters was computed for quality assurance.

Outcome Measures
The MARS includes a multidimensional quality rating consisting of 4 dimensions: engagement (5 items: fun, interest, individual adaptability, interactivity, and target group), functionality (4 items: performance, usability, navigation, and gestural design), aesthetics (3 items: layout, graphics, and visual appeal), and information quality (7 items: accuracy of app description, goals, quality of information, quantity of information, quality of visual information, credibility, and evidence base) [23,31]. Hereby, the evidence base (dimension: information quality) was identified by app description and developer's or provider's websites. Items are rated from 1 (inadequate) to 5 (excellent). Besides these objective scales, subjective quality (recommendation, frequency of use, willingness to pay, overall star rating) and perceived impact on the user (awareness, knowledge, attitudes, intention to change, help-seeking, behavioral change) were assessed. Furthermore, the assessment includes a classification section to examine the app characteristics. The following variables were extracted: (1) app name, (2) store link, (3) platform (Google Play store and App Store), (4) content-related subcategory, (5) aims, (6) price, (7) user rating, (8) content, strategies, and functions (abbreviated as functions in the following, assessed with 22 items; ie, information/education, monitoring/tracking, goal setting, gamification, reminder) and (9) privacy and security features [23,31]. Privacy and security features were rated on a descriptive level (ie, presence of privacy policy, contact information or imprint, log-in with a password). Only information that was displayed within the app was used for evaluation. In this study, MARS-G was used [23,31]. The validation of the MARS-G yielded excellent internal consistency (ω=0.84, 95% CI 0.77-0.88) and high levels of interrater reliability (intraclass correlation coefficient [ICC] 0.83, 95% CI 0.82-0.85) [31].

Statistical Analysis
To evaluate consistency, the ICC between the raters was calculated for quality assurance. Rater agreement was examined by ICC based on a two-way mixed-effects model [33]. An ICC of <0.50 is considered poor, 0.51-0.75 as moderate, 0.76-0.89 as good, and >0.90 as excellent [34]. A minimum ICC of 0.8 was predefined as a sufficient ICC in this study. For quality evaluation, means and standard deviations were calculated for each dimension of the MARS separately and overall. For all calculations, the mean of both raters was used. Further correlation between user ratings provided by Google Play/App Store and the MARS rating was calculated. For correlations analysis, an alpha level of 5% was defined. P values were adjusted using the procedure proposed by Holm [35]. To determine relevant predictors for overall quality, exploratory multiple linear regression analysis was conducted. Price, store, and the number of functions were used as predictors, as they were significant predictors in other systematic app reviews (eg, older adults, mindfulness, depression, posttraumatic stress disorder, rheumatoid arthritis) [36][37][38][39][40]. Dichotomous predictors were dummy coded. Regression estimates represent unstandardized regression coefficients.

Search Results
The search in the Google Play store and App Store yielded 6159 apps without duplicates. Screening resulted in the inclusion of 1817 apps. After downloading and assessing the eligibility criteria in detail, 1495 apps had to be excluded. The remaining 312 apps were included in the analyses (see Figure 1 for further information).

Data Security and Privacy Features
Of the 312 assessed apps, 67 (21.5%) had an imprint/contact information, 60 (19.2%) provided a visible privacy policy, 25 (8.0%) were only accessible with a personalized log-in, 20 (6.4%) utilized a passive informed consent (eg, by continuing you accept our privacy policy), 15 (4.8%) contained an active informed consent (eg, active opt-in to data collection/transfer), 16 (4.8%) had a password option, and 5 (1.6%) gave information about the financial background or made conflicts of interest transparent. No app had embedded emergency features (eg, in case of an accident).

App Quality
The ICC agreement between the raters was high ( The MARS quality ratings are summarized in Figure 3. The overall subjective quality reached an average rating of 2.34 (SD 0.78) and the overall perceived impact on the user was rated as 2.32 (SD 0.60). The details can be found in Table 1. The 10 apps with the highest quality ratings are presented in Multimedia Appendix 1.

User and Expert Agreement Toward Quality
Small correlations were found between the user ratings in the stores and the MARS. The correlations are summarized in Table  2.

Exploratory Regression Analysis
Exploratory regression analysis indicated that app quality could be predicted by the number of functions integrated into the app.
Price and store had no predictive value. The results of the regression analyses are summarized in Table 3.

Principal Findings
In this study, the quality, general characteristics, privacy and security features, and content/functions of apps that promote physical activity in the commercial European app stores were systematically assessed. The included 312 apps showed a moderate overall quality (3.60 [SD 0.59], range 1-4.75). Moreover, several apps showed very high ratings, and there was a large range of quality ratings. The assessments of the 10 best-rated apps are described in detail in Multimedia Appendix 1. Functionality was the dimension with the highest rating, followed by aesthetics, information quality, and engagement. These results corroborate those of Schoeppe and colleagues [22] who evaluated diet and physical activity apps for children and adolescents (overall quality, mean 3.6).
The apps offered a variety of different functions (15 out of 22 functions were used). On average, 3 functions were applied per app, and the most common ones were exercises, goal setting, and monitoring/tracking. This is partly in line with previous reviews for apps promoting physical activity [15,16,18] or weight management [29]. Functions differed from the most frequently used BCTs in another review that evaluated apps for diet, physical activity, and sedentary behavior developed for children and adolescents (the top 3 functions provided instructions, general encouragement, contingent [22]). This discrepancy might be explained by the different target groups. Studies have already shown that BCTs incorporated in apps for health behavior change differ between adults and children/adolescents [20]. The average number of the 3 applied functions was lower compared to that in other reviews that reported 5-8 comparable BCTs [15,16,19,22]. This could be due to the broader range of the included apps in this study.
No randomized controlled trial evaluating the effectiveness of one of the included apps could be identified. This lack of a solid evidence base for the use of health apps is in line with that reported in other systematic reviews of app quality (eg, older adults, mindfulness, depression, rheumatoid arthritis, and posttraumatic stress disorder) [36][37][38][39][40]. These systematic reviews showed that the proportion of the scientifically evaluated apps ranges between 0% and 4.8%. Overall, this indicates a gap between research and health practices. Although there are several randomized controlled trials that investigate the efficacy of sport app use to foster behavior change, these apps are not available in the app stores [41,42]. Of note, a vast majority of apps are downloaded from the Google Play store and App Store [43]. This might stem from the lack of sustainable structures at universities (eg, end of funding, frequent job changes). Furthermore, this imposes a risk for safe sport app use as the evidence base is the gold standard for assuring quality and efficacy. Moreover, data privacy and security features were also rated as low. Only 19.2% (60/312) of all the apps provided a privacy policy, and 21.5% (67/312) of the included apps provided any contact information or an imprint. All other privacy and security features were fulfilled by less than 20% (range 0-60) of all the apps. In contrast, Bondaronek and colleagues [19] stated that almost 70% of the 65 included physical activity apps had a privacy policy. This might be because they searched for the best-ranked apps. Taken together, the ratings of information quality (including correctness, credibility, and scientific evidence) and the ratings of data protection (including privacy policy, imprint, log-in, informed consent, password, conflicts of interest) reveal potential risks such as misinformation, adverse effects of app use, data misuse, or potential nonefficacy.
Average user ratings in the stores showed a significantly small correlation with the MARS ratings, which is in line with that reported in previous research [30,36]. However, several studies (including apps for weight management and chronic pain) could not identify an association [29,44]. This indicates that although user star ratings of physical activity apps might be used as an indicator for app quality, such an association should be evaluated for each indication separately. The results of our study suggest that mostly engagement might play a key role in the high user star rating that is contrary to previous results highlighting the impact of functionality [26,29]. Nevertheless, user star ratings should be interpreted with caution as end users lack the qualification to assess information quality. Furthermore, user star ratings lack credibility as they could originate from fictitious persons or they could refer to previous versions of the app [36].
The only relevant predictor for overall app quality was the number of functions, which is in line with previous results for apps reviews aiming at weight management, diet, physical activity, and sedentary behavior [16,22,29]. This association needs to be addressed in future studies, as it is highly likely that not only the number of functions but also their quality is crucial for overall quality. Furthermore, there might be an optimal number of functions; too many functions might be overwhelming, especially for inexperienced users. Owing to the lack of identified randomized controlled trials, no conclusions about the relationship between quality/functions and effectiveness or side effects can be drawn.

Limitations and Future Research
In this review, apps were only searched in the Google Play store and App Store and, thus, this review does not cover all the available apps promoting physical activity. However, 90% of all the apps are downloaded in these stores [45]. Owing to the broad search strategy and no focus on only the most popular apps or a cut-off concerning user ratings, this review, including 312 apps, can be seen as comprehensive. The search terms were selected after a discussion between psychologists rather than sport scientists. However, EMM is a state-certified coach (Federal Sports Academy Austria). Furthermore, EMM and HB are experts in the field of app ratings with the MARS. The stated search terms provided a more comprehensive quality analysis of apps for physical activity than previous reviews (range   [15][16][17][18]. Since new apps are being developed rapidly and the content of existing apps might have changed, the presented results can only be seen as a snapshot of the current state of the offered apps. Meanwhile, several new apps may be available and some of the included apps may be unavailable by now or may have been updated. Furthermore, apps were not tested for several hours or days. Thus, some features may not have been discovered, and some obstacles may have reminded hidden. Previous reviews evaluating apps for physical activity [15,16,29] assessed BCTs that are common to many health behavior theories [21]. Even though some of these BCTs were included in the content and functions of this review (eg, self-monitoring, feedback on performance, goal setting, or provision of information), a comparison to results concerning BCTs of previous studies is limited. The comparability between systematic app reviews in different health domains should be enhanced by using the functions included in the MARS [38,39,44]. In this systematic review, privacy and security measures were assessed on a descriptive level. Data security and privacy information were only checked based on information within the app. An in-depth analysis of privacy and security features and more elaborated strategies (eg, evaluating whether data collection and transfer are conducted according to privacy policies) are needed [28]. Future studies should extend the findings of this study by using such procedures. Lastly, it should be highlighted that the regression analyses in this study were exploratory. Thus, the results should be interpreted carefully. Confirmatory studies with adequate study designs and power are needed to identify features that are crucial elements for high app quality. Looking at the number of functions-as specified in the exploratory analyses in this study-or investigating persuasive design might be promising to begin with [46,47].

Conclusions
There is a wide range of apps offered to foster physical activity and they show overall moderate quality. High-quality apps have been presented in Multimedia Appendix 1. However, users should be aware of the broad quality range, the lack of evidence, and low ratings in privacy and security features. Thus, recommendations for the use of physical activity apps can only be given with major limitations. The contents and functions correlated positively with quality ratings. Furthermore, user ratings showed small correlations to the quality ratings and might be a limited indicator for end users. However, it seems necessary that developers use evidence-based content and scientifically developed and evaluated apps find their way into the app stores. Since the field of mobile health is rapidly growing, there is a need for continuous up-to-date evaluations of apps to provide and inform end users about data protection, privacy regulations, and evidence base. Central databases such as digital apothecaries [48][49][50] could help the user find high-quality apps and be protected against misinformation and abuse. However, there is also a need for novel methodological frameworks such as continuous evaluations [51] that allow for the assessment of multiple or evolving app versions.