This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR mhealth and uhealth, is properly cited. The complete bibliographic information, a link to the original publication on http://mhealth.jmir.org/, as well as this copyright and license information must be included.
The use of mobile apps for health and well being promotion has grown exponentially in recent years. Yet, there is currently no app-quality assessment tool beyond “star”-ratings.
The objective of this study was to develop a reliable, multidimensional measure for trialling, classifying, and rating the quality of mobile health apps.
A literature search was conducted to identify articles containing explicit Web or app quality rating criteria published between January 2000 and January 2013. Existing criteria for the assessment of app quality were categorized by an expert panel to develop the new Mobile App Rating Scale (MARS) subscales, items, descriptors, and anchors. There were sixty well being apps that were randomly selected using an iTunes search for MARS rating. There were ten that were used to pilot the rating procedure, and the remaining 50 provided data on interrater reliability.
There were 372 explicit criteria for assessing Web or app quality that were extracted from 25 published papers, conference proceedings, and Internet resources. There were five broad categories of criteria that were identified including four objective quality scales: engagement, functionality, aesthetics, and information quality; and one subjective quality scale; which were refined into the 23-item MARS. The MARS demonstrated excellent internal consistency (alpha = .90) and interrater reliability intraclass correlation coefficient (ICC = .79).
The MARS is a simple, objective, and reliable tool for classifying and assessing the quality of mobile health apps. It can also be used to provide a checklist for the design and development of new high quality health apps.
The use of mobile apps for health and well being promotion has grown exponentially in recent years [
Given the rapid proliferation of smart phone apps, it is increasingly difficult for users, health professionals, and researchers to readily identify and assess high quality apps [
Much of the published literature focuses on technical aspects of websites, presented mostly in the form of checklists, which do not assess the quality of these features [
Attempts to develop mobile health (mHealth) evaluation criteria are often too general, complex, or specific to a particular health domain. Handel [
Guidelines for evaluating the usability of mHealth apps were also compiled by the Health Care Information and Management Systems Society (HIMSS) [
A reliable and objective instrument is needed to rate the degree that mHealth apps satisfy quality criteria. This scale should be easy to understand and use with minimal training. This scale will initially be used by researchers, but may later be made available to app developers and health professionals, pending further research.
The objective of this study is to develop a reliable, multidimensional scale for classifying and rating the quality of mobile health apps.
A comprehensive literature search was conducted to identify articles containing explicit Web- or app-related quality rating criteria. English-language papers from January 2000 through January 2013 were retrieved from PsycINFO, ProQuest, EBSCOhost, IEEE Xplore, Web of Science, and ScienceDirect. The search terms were, “mobile” AND “app*” OR “web*” PAIRED WITH “quality” OR “criteria” OR “assess*” OR “evaluat*”.
Three key websites, including the EU’s Usability Sciences [
Website and app assessment criteria identified in previous research were extracted. Criteria irrelevant to mobile content and duplicates were removed. An advisory team of psychologists, interaction and interface designers and developers, and professionals involved in the development of mHealth apps worked together to classify assessment criteria into categories and subcategories, and develop the scale items and descriptors. Additional items assessing the app’s description in the Internet store and its evidence base were added. Corrections were made until agreement between all panel members was reached.
A systematic search of the Apple iTunes store was conducted on September 19, 2013, following the PRISMA guidelines for systematic literature reviews [
App inclusion criteria were: (1) English language; (2) free of charge; (3) availability in the Australian iTunes store; and (4) from iTunes categories, “Health & Fitness”, “Lifestyle”, “Medical”, “Productivity”, “Music”, “Education”, and “Utilities”. The category inclusion criteria were based on careful scrutiny of the titles and types of apps present in those categories.
There were 60 apps that were randomly selected using a randomization website [
The search strategy yielded 25 publications, including peer-reviewed journal articles (n=14), conference proceedings (n=8), and Internet resources (n=3) containing explicit mobile or Web-related quality criteria. The complete list of utilized resources is available with this article (see
Number of criteria for evaluation of mHealth app quality identified in the literature search.
Criterion category | Frequency, N=349 | (%) |
App classification, confidentiality, security, registration, community, affiliation | 12 | (3.4) |
Aesthetics, graphics, layout, visual appeal | 52 | (14.8) |
Engagement, entertainment, customization, interactivity, fit to target group, etc | 66 | (18.9) |
Functionality, performance, navigation, gestural design, ease of use | 90 | (25.8) |
Information, quality, quantity, visual information, credibility, goals, description | 113 | (32.4) |
Subjective quality, worth recommending, stimulates repeat use, overall satisfaction rating | 16 | (4.6) |
The
The app quality criteria were clustered within the
Calculating the mean scores of the e
A total of 1533 apps were retrieved from the iTunes search. All duplicate, non-English, and paid apps were removed. Apps from the categories “games”; “books”; “business”; “catalog”; “entertainment”; “finance”; “navigation”; “news”; “social networking”; and “travel” were also removed. Remaining apps were screened by title. The app store descriptions of apps with unclear titles were reviewed prior to exclusion. App titles with the words “magazine”, “mother”, “mum”, “job”, “festival”, “massage”, “shop”, or “conference”, as well as company ads and Web apps were also excluded, as they were linked to irrelevant content. There were sixty of the remaining 405 apps that were randomly selected for rating with the MARS (
On attempting to rate the initial ten apps, it was found that one was faulty and could not be rated. MARS ratings of the remaining nine apps indicated the scale had a high level of internal consistency (Cronbach alpha = .78) and fair interrater reliability (2-way mixed ICC = .57, 95% CI 0.41-0.69). The Not applicable option was removed from items within the
Independent ratings on the overall MARS total score of the remaining 50 mental health and well being apps demonstrated an excellent level of interrater reliability (2-way mixed ICC = .79, 95% CI 0.75-0.83). The MARS total score had excellent internal consistency (Cronbach alpha = .90) and was highly correlated with the MARS star rating item (#23),
Only 15 of the 50 mental health and well being apps extracted from the iTunes App Store had received the five user ratings required for a star rating to be displayed. These apps showed a moderate correlation between the iTunes star rating and the total MARS score (
Interrater reliability and internal consistency of the MARS items and subscale scores, and corrected item-total correlations and descriptive statistics of items, based on independent ratings of 50 mental health and well being apps.
# |
|
Subscale/item | Corrected item-total correlation | Mean | SD | |
|
||||||
|
1 | Entertainment | .63 | 2.49 | 1.24 | |
|
2 | Interest | .69 | 2.52 | 1.20 | |
|
3 | Customization | .60 | 2.27 | 1.15 | |
|
4 | Interactivity | .65 | 2.70 | 1.22 | |
|
5 | Target group | .61 | 3.41 | 0.93 | |
|
||||||
|
6 | Performance | .42 | 4.00 | 0.93 | |
|
7 | Ease of use | .29 | 3.93 | 0.87 | |
|
8 | Navigation | .48 | 4.00 | 0.94 | |
|
9 | Gestural design | .48 | 4.10 | 0.79 | |
|
||||||
|
10 | Layout | .56 | 3.91 | 0.87 | |
|
11 | Graphics | .61 | 3.41 | 0.92 | |
|
12 | Visual appeal: How good does the app look? | .60 | 3.14 | 0.91 | |
|
||||||
|
13 | Accuracy of app description | .67 | 3.66 | 1.03 | |
|
14 | Goals | .70 | 3.43 | 1.10 | |
|
15 | Quality of information | .47 | 3.18 | 1.46 | |
|
16 | Quantity of information | .58 | 2.87 | 1.54 | |
|
17 | Visual information | .39 | 1.35 | 1.89 | |
|
18 | Credibility | .46 | 2.79 | 0.95 | |
|
19 | Evidence basea | - | - | - | |
|
||||||
|
20 | Would you recommend this app? | .84 | 2.31 | 1.17 | |
|
21 | How many times do you think you would use this app? | .82 | 2.46 | 1.12 | |
|
22 | Would you pay for this app? | .63 | 1.31 | 0.60 | |
|
23 | What is your overall star rating of the app? | .89 | 2.69 | 1.06 |
a Item 19 “Evidence base” was excluded from all calculations, as it currently contains no measurable data.
b The
Flow diagram of the process utilized to identify apps for piloting the Mobile App Rating Scale (MARS).
The MARS is the first mHealth app quality rating tool, to our knowledge, to provide a multidimensional measure of the app quality indicators of
The use of objective MARS item anchors and the high level of interrater reliability obtained in the current study should allow health practitioners and researchers to use the scale with confidence. Both the app quality total score and four app-quality subscales had high internal consistency, indicating that the MARS provides raters with a reliable indicator of overall app quality, as well as the quality of app
It is recommended that MARS raters complete a training exercise before commencing use. Training slides are available from the corresponding author. If multiple MARS raters are utilized, it is recommended that raters develop a shared understanding of the target group for the apps, clarify the meaning of any MARS items they find ambiguous, and determine if all MARS items and subscales are relevant to the specific health area of interest. App-quality ratings should be piloted and reviewed until an appropriate level of interrater reliability or consensus ratings are reached. The MARS also assumes that raters have undertaken a detailed exploration of the app’s content and functionalities.
Due to the generic nature of the mHealth app quality indicators included in the MARS, it is recommended that a number of “App-Specific” items are added to obtain information on the perceived impact of the app on the user’s knowledge, attitudes, and intentions related to the target health behavior (see
For convenience, the MARS was piloted on iPhone, rather than Android apps. Since initial testing, however, the scale has been applied to multiple Android apps and no compatibility issues were encountered. However, future research should explore the reliability of the scale with Android apps.
While the original search strategy to identify app-quality rating criteria was conducted using guidelines for a systematic review, few peer-reviewed journal articles were identified. As a result, the search strategy was expanded to include conference proceedings and Internet resources, which may not have been as extensively peer reviewed. Suggested guidelines for scale-development were followed [
Researchers are yet to test the impact of the mental health apps included in this study. As a result, the MARS item
Future research is required to determine the suitability and reliability of the MARS across multiple health and other app domains, as well as its applicability in the sphere of app development. The association of the app quality total and subscale scores with the concepts of user experience, quality of experience, and quality of service requires further investigation. Future refinements of MARS terminology and additional items are likely to be required, as the functionality of mobile apps progresses. It is hoped the current version of the MARS provides mHealth app-developers with a checklist of criteria for ensuring the design of high-quality apps.
The MARS could also be utilized to provide quantitative information on the quality of medical apps as part of recent medical app peer-review initiatives, such as that launched by JMIR mHealth and uHealth [
With some modification, the MARS may also inform the development and quality rating of health-related websites. While the MARS was designed to be utilized by experts in the mHealth field, a simpler version of the scale, “MARS-app user”, based on the original MARS, was developed in consultation with youth agencies and young people for the purposes of obtaining user feedback on app quality and satisfaction. The MARS-app user version is currently being piloted. It is available upon request from the corresponding author.
Future research is also required to determine how to best evaluate the safety of mHealth apps in terms of the quality of the health information contained in the apps and the privacy and security of user information [
The MARS provides a multidimensional, reliable, and flexible app-quality rating scale for researchers, developers, and health-professionals. Current results suggest that the MARS is a reliable measure of health app quality, provided raters are sufficiently and appropriately trained.
Papers, publications, and materials used for MARS criteria selection.
Mobile App Rating Scale.
Mobile Apps Used for MARS Evaluation.
cognitive behavioral therapy
Health Care Information and Management Systems Society
intraclass correlation coefficient
Mobile App Rating Scale
mobile health
user experience
Young and Well Cooperative Research Centre
The Young and Well Cooperative Research Centre (Young and Well CRC) funded the project. The Young and Well CRC is an Australian-based, international research center that unites young people with researchers, practitioners, innovators, and policy-makers from over 70 partner organizations. Together, we explore the role of technology in young people’s lives, and how it can be used to improve the mental health and well-being of young people ages 12 to 25. The Young and Well CRC is established under the Australian Government’s Cooperative Research Centres Program.
We would like to acknowledge Associate Professor Susan Keys and Michael Gould for their assistance with the development of the original version of the MARS.
Our gratitude goes out to Dimitrios Vagenas for his statistical advice.
An Australian Research Council Future Fellowship supports LH.
None declared.