This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR mHealth and uHealth, is properly cited. The complete bibliographic information, a link to the original publication on http://mhealth.jmir.org/, as well as this copyright and license information must be included.
Despite the prevalence of mobile health (mHealth) technologies and observations of their impacts on patients’ health, there is still no consensus on how best to evaluate these tools for patient self-management of chronic conditions. Researchers currently do not have guidelines on which qualitative or quantitative factors to measure or how to gather these reliable data.
This study aimed to document the methods and both qualitative and quantitative measures used to assess mHealth apps and systems intended for use by patients for the self-management of chronic noncommunicable diseases.
A scoping review was performed, and PubMed, MEDLINE, Google Scholar, and ProQuest Research Library were searched for literature published in English between January 1, 2015, and January 18, 2019. Search terms included combinations of the description of the intention of the intervention (eg, self-efficacy and self-management) and description of the intervention platform (eg, mobile app and sensor). Article selection was based on whether the intervention described a patient with a chronic noncommunicable disease as the primary user of a tool or system that would always be available for self-management. The extracted data included study design, health conditions, participants, intervention type (app or system), methods used, and measured qualitative and quantitative data.
A total of 31 studies met the eligibility criteria. Studies were classified as either those that evaluated mHealth apps (ie, single devices; n=15) or mHealth systems (ie, more than one tool; n=17), and one study evaluated both apps and systems. App interventions mainly targeted mental health conditions (including Post-Traumatic Stress Disorder), followed by diabetes and cardiovascular and heart diseases; among the 17 studies that described mHealth systems, most involved patients diagnosed with cardiovascular and heart disease, followed by diabetes, respiratory disease, mental health conditions, cancer, and multiple illnesses. The most common evaluation method was collection of usage logs (n=21), followed by standardized questionnaires (n=18) and ad-hoc questionnaires (n=13). The most common measure was app interaction (n=19), followed by usability/feasibility (n=17) and patient-reported health data via the app (n=15).
This review demonstrates that health intervention studies are taking advantage of the additional resources that mHealth technologies provide. As mHealth technologies become more prevalent, the call for evidence includes the impacts on patients’ self-efficacy and engagement, in addition to traditional measures. However, considering the unstructured data forms, diverse use, and various platforms of mHealth, it can be challenging to select the right methods and measures to evaluate mHealth technologies. The inclusion of app usage logs, patient-involved methods, and other approaches to determine the impact of mHealth is an important step forward in health intervention research. We hope that this overview will become a catalogue of the possible ways in which mHealth has been and can be integrated into research practice.
Health research is yet to agree upon a framework for evaluating mobile health (mHealth) interventions. This is especially true for tools, such as apps and wearables, that are intended primarily to aid patients in health self-management. Traditionally, the evaluation of mobile medical devices has been based on clinical evidence, and it can take years to bring these devices to the market. The continuous glucose monitor first came onto the market in 1999, but it was not until 2006 that the next version was available [
Individuals are more empowered to take greater responsibility for their health, and currently, they enthusiastically seek out mHealth apps and other devices for self-management. For chronic conditions in particular, health challenges occur continuously, not just when it is convenient or at a doctor’s office. Technologies for self-management must allow individuals to register and review the measurements that they input into the app or system at any time. Connectivity to devices, such as medical or commercial sensors and wearables, adds to the utility of an app. A report by Research2Guidance [
The amount of assessment and testing that is necessary for health technology is directly related to its potential risks and benefits [
There are two main categories of mobile medical or mHealth devices associated with the amount of oversight health authorities will show; those that are “actively regulated” and those that fall under “enforcement discretion.” These categories are described in the 2015 Guidance for Industry and Food and Drug Administration Staff [
Although there have been many strategies [
Although clinical evidence is essential for the evaluation of any health aid, the two major concepts of time and human behavior must also be addressed in mHealth evaluation. As “always available” technologies are being used continuously and uniquely by patients, it is uncertain how much time is needed to produce an effect and what changes in self-management behavior will occur. Traditionally, medical devices rely on established biological knowledge, have fewer alternatives in the market, and do not offer frequent updates. However, patient-operated mHealth approaches require the consideration of patients’ motivation, health beliefs, and resources for self-management. They must also compete with hundreds of other mHealth apps and devices that are continuously developed and updated. In recent years, clinical research has attempted to keep pace with mHealth by employing methods that aim to expedite the research process and produce more tailored knowledge for the field of mHealth [
Stakeholders associated with chronic health and care (researchers, individuals, health care providers, and health care authorities) have been calling for evidence related to the personal use of mHealth technologies for many years [
Recent scoping reviews of mHealth technologies for chronic conditions focused on evidence as it relates to a specific age group [
The research questions were as follows: (1) What methods are used to evaluate patient-operated mHealth apps and systems for self-management of chronic NCDs? (2) Which qualitative and quantitative measures are used to evaluate the impact of patient-operated mHealth apps and systems for self-management of chronic NCDs?
We performed a scoping review to document how researchers have evaluated mHealth interventions for self-management of chronic NCDs. Munn et al [
The scope of the search and definitions of mHealth were discussed among the coauthors (MB, EG, EÅ, and MJ). The databases searched for scientific literature were PubMed, MEDLINE, Google Scholar, and ProQuest Research Library. PubMed and MEDLINE were both included because PubMed includes citations that are not yet indexed in MEDLINE [
Medical Subject Headings (MeSH) terms were not considered because our search included articles published recently, which may contain terminology that has not yet been indexed within the MeSH database. The identified abstracts and titles were collected in EndNote [
We aimed to include research efforts that may have addressed new guidelines for mobile medical devices. Within our broad search criteria for low-risk mHealth apps and systems, articles were eligible for inclusion if they described low-risk technologies consistent with the FDA and CE Markings’ description of mobile medical devices under “enforcement discretion” [
A preliminary search was performed, and a random selection of 10 articles was reviewed for inclusion or exclusion by two authors (MB and EG). Refinements were made to the review criteria.
For this review, we included studies that evaluated interventions involving (1) mHealth technologies for chronic NCDs, including the primary NCDs listed by the WHO [
The details of the inclusion and exclusion criteria are described in
After removing duplicate articles, reviews, and protocol articles without evaluation results, two authors (MB and PJ) independently screened the titles and abstracts for eligibility according to the inclusion and exclusion criteria. In case of disagreement regarding eligibility, another author (EG) was called to join the discussion until an agreement was reached. Author MB reviewed the full-text articles and performed data extraction.
The identified studies were classified as either those that evaluated mHealth apps or mHealth systems. Interventions that included a single app were grouped as mHealth apps, whereas those that included services or devices connected to a central app were grouped as mHealth systems. In this way, we could more clearly assess the different approaches taken by researchers when addressing the various impacts of these two mHealth intervention types.
For both groups, one author (MB) assessed whether a study was able to produce the evidence that it aimed to obtain, using the selected methods. This was performed by comparing the objectives as stated by the authors of the identified articles to the methods and reported results. The studies were judged according to their ability to produce the information, and the findings were reported as yes, yes and more than expected, no, and cannot tell. The results of these comparisons are detailed in
Among 3912 records identified by the search criteria, we reviewed 55 full-text articles and included 31 studies for data extraction and synthesis.
Flow diagram illustrating the selection of studies for inclusion in data synthesis. NCD: noncommunicable disease.
Among the 31 studies chosen for data extraction, 15 were categorized as those that evaluated mHealth apps and 17 were categorized as those that evaluated mHealth systems. One study evaluated both apps and systems [
Information about the studies that evaluated mHealth apps.
Reference | App name | Year | Country | Study design | Duration | Health condition | Patient participants | Health care provider and caregiver participants | Intended secondary users |
[ |
Diet and Activity Tracker (iDAT) | 2015 | Singapore | Prospective study | 8 weeks | Type 2 diabetes | Patients (n=84) | N/Aa | N/A |
[ |
Diabetes Notepad | 2015 | Korea | Cross-sectional study | Single evaluation | Diabetes | Patients (n=90) | N/A | N/A |
[ |
Personal Life-chart app | 2015 | Germany | Prospective study | 72 weeks | Bipolar disorder | Patients (n=54) | N/A | N/A |
[ |
HeartKeeper | 2015 | USA | Cross-sectional study | Single evaluation | Heart diseases | Patients (n=24) and researchers | N/A | N/A |
[ |
HeartKeeper | 2016 | Spain | Retrospective study | 36 weeks | Heart diseases | Patients (n=32) | N/A | N/A |
[ |
PTSD Coach | 2015 | USA | Retrospective study | Duration of availability of the app on app stores | Post-traumatic stress disorder | Current users (n=156) | N/A | N/A |
[ |
PTSD Coach | 2015 | USA | RCTb | 16 weeks | Post-traumatic stress disorder | Patients (n=10) | Health care providers (n=3) | Health care providers |
[ |
PTSD Coach | 2016 | USA | RCT | 4 weeks | Post-traumatic stress disorder | Patients (n=49) | N/A | N/A |
[ |
PTSD Coach | 2017 | USA | RCT | 24 weeks | Post-traumatic stress disorder | Patients (n=120) | N/A | N/A |
[ |
Hypertension management app (HMA) | 2016 | Korea | —c | Single event evaluation | Hypertension | Patients (n=38) | Nurses (n=3) and experts (n=5) | N/A |
[ |
Multiple commercial apps for heart failure | 2016 | USA | Cross-sectional study | Single evaluation | Heart failure | Apps (n=34) | N/A | Family, friends, and health care providers (not all apps) |
[ |
Multiple commercial apps (n=11) | 2016 | USA | Cross-sectional study | Single evaluation | Multiple | Patients (n=20) | Caregivers (n=9) | N/A |
[ |
I-IMR intervention | 2017 | USA | Cross-sectional study | Single evaluation | Serious mental health conditionse | Patients (n=10) | N/A | N/A |
[ |
Serenita | 2017 | Israel | Prospective study | 16 weeks | Type 2 diabetes | Patients (n=7) | Health care providers | N/A |
[ |
Sinasprite database | 2018 | USA | Retrospective study | 6 weeks | Depression and anxiety | Patients (n=34) | N/A | N/A |
aN/A: not applicable.
bRCT: randomized controlled trial.
cNot available.
dStudy evaluated both apps and systems and therefore will appear in both categories.
eCombination of cardiovascular disease, obesity, diabetes, high blood pressure, high cholesterol, osteoporosis, gastroesophageal reflux disease, osteoarthritis, chronic obstructive pulmonary disease, congestive heart failure, coronary artery disease, and bipolar disorder, major depressive disorder, schizophrenia, or schizoaffective disorder [
Information about the studies that evaluated mHealth systems.
Reference | Intervention name | Year | Country | Study design | Duration | Health condition | Participants | Intended secondary users | Others involved in the intervention | Medical device included (Y/N) | Other devices included |
[ |
SUPPORT-HF Study | 2015 | UK | Cross-sectional study | 45 weeks | Heart failure | Patients (n=26) | Health care providers | Health care providers and informal care givers | Y | Blood pressure monitor, weight scales, and pulse oximeter |
[ |
—a | 2015 | USA | Cross-sectional study | Single evaluation | Diabetes | Patients (n=87) and health care providers (n=5) | Health care providers | Health care providers | Y | Glucose meter |
[ |
Multiple commercial technologies for activity tracking | 2015 | USA | Prospective study | 80-100 days (mean 12.5 weeks) | Serious mental health conditionb | Patients (n=10) | Health care providers and peers (optional) | N/Ac | N | Wearable activity monitoring devices |
[ |
Diabetes Diary app | 2015 | Norway | Prospective study | 2 weeks | Type 1 diabetes | Patients (n=6) | N/A | N/A | Y | Smart-watch app and glucose meter |
[ |
Diabetes Diary app | 2015 | Norway | RCTd | 23 weeks | Type 1 diabetes | Patients (n=30) | N/A | N/A | Y | Glucose meter |
[ |
Diabetes Diary app | 2016 | Norway | RCT |
48 weeks | Type 2 diabetes | Patients (n=151) | Health care providers | N/A | Y | Glucose meter |
[ |
SnuCare | 2016 | Korea | Prospective study | 8 weeks | Asthma | Patients (n=44) | N/A | Research team | Y | Peak flow meter |
[ |
HealthyCircles Platform | 2016 | USA | RCT | 24 weeks | Hypertension | Patients (n=52) | Health care providers | Health care providers | Y | Withings blood pressure monitor |
[ |
Multiple commercial technologies for activity tracking | 2016 | USA | Prospective study | 24 weeks | Serious mental health conditionb | Patients (n=11) | N/A | N/A | N | Fitbit Zip |
[ |
Multiple commercial apps for heart failure | 2016 | USA | Cross-sectional study | Single evaluation | Stroke | Apps (n=34) | Family, friends, and health care providers (not all apps) | N/A | N | Y |
[ |
Electronic Patient Reported Outcome tool (ePRO) | 2016 | Canada | Prospective study | 4 weeks | Multiple | Patients (n=8) and health care providers (n=6) | Health care providers | Health care providers | N | N |
[ |
STARFISH | 2016 | UK | Prospective study | 6 weeks | Stroke | Patients (n=23) | Peers (automatic) | N/A | N | ActivPAL™ activity monitor |
[ |
HeartMapp | 2016 | USA | Cross-sectional study | Single evaluation | Heart failure | Patients (n=25) and health care providers (n=12) | Health care providers | Health care providers | Y | Zephyr Bioharness or Biopatch |
[ |
EDGE digital health system | 2017 | UK | RCT | 48 weeks | Chronic obstructive pulmonary disease | Patients (n=110) and research nurses (n=2) | Health care providers (automatic) | Informal care givers | N | N |
[ |
IBGStar Diabetes Manager Application | 2017 | Germany | Prospective study | 12 weeks | Diabetes | Patients (n=51) | N/A | N/A | Y | iBGStar blood glucose meter |
[ |
MyHeart | 2017 | USA | Prospective study | 24 weeks | Heart failure | Patients (n=8) and nurses | Nurses (automatic) | Nurses | Y | Weight scale, blood pressure monitor, and glucose meter |
[ |
— | 2018 | UK | Cross-sectional study | 4 weeks | Cancer | Patients (n=23) | Peers and health care providers | N/A | N | N |
aNot available.
bSchizophrenia spectrum disorder, bipolar disorder, or major depressive disorder [
cN/A: not applicable.
dRCT: randomized controlled trial.
eStudy evaluated both apps and systems and therefore will appear in both categories.
App interventions mainly targeted mental health conditions (n=7), followed by diabetes (n=3) and cardiovascular and heart diseases (n=4), with one study evaluating multiple apps that were used to self-manage multiple health conditions (
Patients were included in all studies, and the studies had between 3 and 156 participants (median 36, IQR 15-87, maximum 156). The exception was one study in which only researchers evaluated patient-operated apps according to Google recommendations and quality standards [
Six studies utilized single evaluations, either through a cross-sectional design [
Among the 17 studies that described mHealth systems, most involved patients diagnosed with cardiovascular and heart disease (n=6), followed by diabetes (n=5), respiratory disease (n=2), mental health conditions (n=2), cancer (n=1), and multiple illnesses (n=1;
As with mHealth app studies, all system studies, except one [
In 12 studies, patients were required to share data (n=6) [
Few studies (n=3) used single evaluations. RCTs (n=4) lasted longer (35.75 weeks on average) than cross-sectional studies (mean 24.5 weeks, n=2) and prospective studies (mean 12.93 weeks, n=7). Overall system evaluations lasted a mean of 20.32 weeks, which is very close to that for app interventions, but with a higher median number of 23 weeks.
Most studies included a combination of qualitative and quantitative methods of evaluation. Evaluation of usage logs was the most commonly adopted method (21 studies), followed by standardized questionnaires (17 studies;
Categories of methods used to evaluate mHealth interventions.
Methods (adopted approaches) | Studies that evaluated mHealth apps | Studies that evaluated mHealth systems |
Evaluation of usage logs | [ |
[ |
Standardized questionnaires | [ |
[ |
Ad-hoc questionnaires | [ |
[ |
Interviews | [ |
[ |
Clinical outcomes | [ |
[ |
Open feedback (ie, oral or written) | [ |
[ |
Collection of additional device data (eg, medical device data) | N/Aa | [ |
Field study and observation | [ |
[ |
Focus groups | N/A | [ |
Observational tests (in a lab setting) | [ |
N/A |
Quality guidelines | [ |
[ |
Medical record entries | [ |
[ |
Attendance (intervention assigned activities/meetings) | [ |
N/A |
Download count | [ |
N/A |
aN/A: not applicable.
Among the 14 ad-hoc questionnaires used, four were developed according to concepts or questions from standardized questionnaires [
Of note, some studies inferred more information from usage logs than the count and type of app interactions and patient-gathered data. For example, Triantafyllidis et al [
Categories of qualitative and quantitative data that were measured to evaluate mHealth interventions.
Types of data measured | Studies that evaluated mHealth apps | Studies that evaluated mHealth systems |
Interactions (via app) | [ |
[ |
Usability/feasibility | [ |
[ |
Patient-gathered self-management data (via app) | [ |
[ |
Efficacy/effectiveness | [ |
[ |
Physical well-being | [ |
[ |
Perceptions, opinions, and suggestions | [ |
[ |
Intervention experiences | [ |
[ |
Psychological well-being | [ |
[ |
Patient-reported health | [ |
[ |
Self-efficacy | [ |
[ |
Engagement/motivation in self-management | [ |
[ |
Health care utilization and impact | [ |
[ |
Task performance | [ |
[ |
Study engagement | [ |
[ |
Patient-reported app use | [ |
[ |
Patient-reported self-management | [ |
[ |
Quality of life | [ |
[ |
App features and quality | [ |
[ |
Efficiency | N/Aa | [ |
Security | [ |
[ |
Lifestyle | [ |
N/A |
aN/A: not applicable.
Although a single method can often provide information regarding more than one measure, over one-third of the studies in this review used more than one method to collect information on one type of measure [
Conversely, measures can be reported using more than one method. For example, usability/feasibility was the most common measure (22 times in 17 studies), followed by efficacy/effectiveness (20 times in 16 studies), interactions (via app; 19 times in 19 studies), physical well-being (18 times in 13 studies), and patient-gathered self-management data (via app; 15 times in 14 studies;
The study by Possemato et al [
More comprehensive mapping of methods and measures revealed that the methods that were used to produce the most diverse set of data were, as expected, interviews (n=9), standardized questionnaires (n=16), and study-specific questionnaires (n=13;
A comparison of the study objectives with the results demonstrated that 30 of the 31 studies reported the results that they intended. One study reported all but one of the intended results described in the original objectives (ie, whether the reviewed apps and systems had been previously validated) [
We identified 31 studies that described evaluations of mHealth apps or systems, with one describing evaluation of both intervention types [
Although clinical integration of mHealth technologies is on the rise, only two studies described app interventions that were meant to be used by secondary users (ie, health care providers and family and friends) [
Health evaluation studies are meant to produce evidence and understanding of how various interventions could affect patients and providers in real-world health care settings. Traditionally, studies have been classified within a hierarchy based on their designs, methods, and measures used to evaluate health interventions [
Health intervention researchers are not given instructions or guidance about how to evaluate these mHealth apps or which additional evidence is needed to determine their comprehensive impacts on patients and providers. The recent addition of connected technologies, such as wearables and sensors, has introduced even more factors to the evaluation context. Interventions now vary from recording exercise, to decision support for patient self-management, to providing evidence of a patients’ actions for health care providers, to review from a variety of data sources. Because of these new information sources, we cannot always anticipate all of the impacts of these diverse networks of mHealth self-management technologies. For example, 10 studies did not intend to obtain results related to certain factors, such as usage logs and patient-reported outcomes [
The assessment of a study’s success, validity, or quality presents another challenge to traditional research practice. mHealth resources consist of factors that make standard quality assessments inconclusive for intervention studies. For example, identifying patterns of patient self-management habits and progress describes the impact of an mHealth intervention on a patient’s behavior. However, the analysis of usage logs, as a measure of intervention effectiveness, patient engagement, or self-management practices, has been minimally investigated as an appropriate method. As demonstrated by some of the reviewed articles, usage logs, download counts, and online ratings of apps were interpreted as indications of patient engagement, self-management behavior, intervention reach [
As opposed to completing a formal quality assessment, we chose to determine whether a study was able to produce the evidence that it aimed to provide, using selected methods. Some studies that performed usage log analysis were able to produce more information than they anticipated. Possemato et al [
Among the 31 studies identified, one did not obtain all of the intended information (missing one of the intended outcomes) [
mHealth must work for health care providers as well as patients. Patients are more engaged in their health, and they incorporate mHealth into their self-management. Thus, patients are aware of and can even influence how an mHealth intervention should or could be used to influence the kind of impact that is relevant for them. Understanding the potential risks and benefits of patient-operated mHealth requires more continuous evidence of not only technical and clinical outcomes but also personal and psychological impacts. This review demonstrates, through the use of such measures as mHealth interactions and patient-gathered data via an app, that we as researchers have the resources at our disposal and are beginning to use them.
A 2016 study by Pham et al [
Several studies within the presented scoping review demonstrated an attempt to meet this call by including more flexibility in their intervention design. For example, the EDGE digital health system [
We believe our review covers most of the articles that were published during the established period and dealt with mHealth interventions for chronic conditions. This review reported on patient-operated mHealth self-management and did not include other potentially relevant interventions, such as SMS-based interventions.
We chose to focus on self-management of chronic NCDs, as defined by the WHO, in addition to severe mental health conditions, according to the demand for solutions from two fields (the medical system and public app development market) [
Because we did not collect data on reported results for this scoping review and did not perform a systematic methodological quality assessment, we cannot comment on the usefulness or effectiveness of the mHealth app and system interventions presented in these studies.
Researchers are now using several mHealth resources to evaluate mHealth interventions for patient self-management of select NCDs. This is evident as studies relied mostly on more continuous measures, including usage logs [
There is still no clear standard for the evaluation of mHealth interventions for patient self-management of chronic conditions. However, because mHealth presents additional challenges, needs, and resources to the field of health intervention research, we have the opportunity to expand and maintain our relevance to patients, providers, and health authorities. mHealth provides new types of information that we can and should gather to determine the impact of the interventions.
The presented results demonstrate that health studies have started to take advantage of additional mHealth resources, such as app usage logs and other patient-involved research methods, to determine the comprehensive impacts of mHealth on patients and other stakeholders. We are able to not only answer questions, such as which tasks patients choose to perform during interventions that may affect their clinical outcomes, but also say more about the relevance of mHealth for various types of users. This is essential in health intervention research, as the call for evidence on mHealth continues to push for not only traditional clinical health measures but also impacts on patients’ self-efficacy and engagement. We believe that to achieve a compromise between the rigidity of traditional quality standards and the push for more patient-relevant outcomes, the definition of quality or meaningful impact, as well as available and appropriate evidence should be reassessed.
PRISMA-ScR checklist.
Search strategy.
Scope of included technologies.
Inclusion and exclusion criteria by category.
Comparison of study objectives to reported results.
List of questionnaires and scales used in mHealth intervention studies.
Mapping of measures to methods.
Medical Subject Headings
mobile health
noncommunicable disease
randomized controlled trial
World Health Organization
As a PhD candidate, the primary author is grateful for the input and guidance of the coauthors, who include all of the supervisors as part of the multidisciplinary Full Flow Project. This work was conducted as part of the Full Flow Project, which is funded by the Research Council of Norway (number 247974/O70). The publication charges for this article have been funded by a grant from UiT-The Arctic University of Norway’s publication fund.
MB, EG, and EÅ developed the search and inclusion criteria. MB and PJ performed the literature search, article screening, and data collection. EG served as a third reviewer when disputes surrounding the inclusion of an article arose. MB performed data synthesis and drafting of the manuscript. PZ contributed to the planning and editing of the manuscript. EG and EÅ additionally contributed to the editing of the text. MJ and RJ provided quality assurance of the manuscript and the necessary details within the description of the literature search and article selection. LPH guided article content. All authors have read and approved the final version of this manuscript.
None declared.