Mobile Apps for Mental Health Issues: Meta-Review of Meta-Analyses

Background: Mental health apps have great potential to help people needing support to cope with distress or specific symptoms. In fact, there is an exponential increase in the number of mental health apps available on the internet, with less than 5% being actually studied. Objective: This study aimed to assess the quality of the available evidence regarding the use of mental health apps and to summarize the results obtained so far. Methods: Systematic reviews and meta-analyses were searched, specifically for mobile apps on mental health issues or symptoms, and rated using the Grading of Recommendations Assessment, Development and Evaluation system. Results: A total of 7 meta-analyses were carefully reviewed and rated. Although some meta-analyses looked at any mental health issue and analyzed the data together, these studies were of poorer quality and did not offer strong empirical support for the apps. Studies focusing specifically on anxiety symptoms or depressive symptoms were of moderate to high quality and generally had small to medium effect sizes. Similarly, the effects of apps on stress and quality of life tended to offer small to medium effects and were of moderate to high quality. Studies looking at stand-alone apps had smaller effect sizes but better empirical quality than studies looking at apps with guidance. The studies that included follow-ups mostly found a sustained impact of the app at an 11-week follow-up. Conclusions: This meta-review revealed that apps for anxiety and depression hold great promise with clear clinical advantages, either as stand-alone self-management or as adjunctive treatments. More meta-analyses and more quality studies are needed to recommend apps for other mental health issues or for specific populations. (JMIR Mhealth Uhealth 2020;8(5):e17458) doi: 10.2196/17458


Mobile Health and Apps
Recent years have seen an exponential development of mobile technologies aimed at improving various mental health problems. Such technologies are considered part of a new field of medicine called mobile health (mHealth). This term refers to health (including mental health) supported by mobile technologies [1]. The mHealth field is booming, with a plethora of health-related apps, websites, and text messaging-support interventions being developed by the industry and being adopted by the public [2]. However, only a small proportion of these technologies have undergone any form of empirical assessment [3].
This lack of app validation is a concern, even more so when studies suggest that mental health-and addiction-related apps currently available to the public, with few exceptions, offer insufficient content quality [4][5][6][7][8]. Fortunately, recent years have seen an increase in the gathering of empirical data related to smartphone app-related interventions [9].

What Are Mobile Mental Health Interventions?
According to the World Health Organization's definition of mHealth, mobile mental health interventions could be considered as mental health services (medical and public health practices) "supported by mobile devices, such as mobile phones, patient monitoring devices, personal digital assistants (PDAs), and other wireless devices" [10]. They include smartphone apps, voice, video or text messaging interventions, real-time tracking, and Web-based interventions, to name a few. In this review, we were specifically interested in mental health apps.
Smartphone apps, because of their worldwide mobility, connectivity, 24-hour availability, and their ubiquitous characteristics, are strong vectors for mHealth interventions. Furthermore, they can convey a large range of technologies and functionalities, such as virtual reality, augmented reality (inserting computer elements into the real field), telemedicine, robotics, games, interfaces connected to sensors, social networks, real-time interactivity, geolocation, and more [11].
App technology has shown the greatest reach in the past few years. According to the United Nations, more than 90% of the population in developed countries use such apps daily [12]. Once available for the target audience, because of the wide dissemination of smartphone devices, such tools can attract many downloads from all over the world. This potential is illustrated by tens of thousands of downloads of such apps [13,14].
The ubiquitous, handy mobile format and 24-hour availability of smartphones offer an important advantage for using apps to target mental health problems. One may hypothesize that the treatment of mental disorders could be improved by effective support in the right place at the right time. As learning is context dependent [15], apps can support the process of empowerment and recovery of people with various mental health problems by allowing people to access tools or support when needed [16,17].
Mental health apps can also offer the opportunity to assess, with Ecological Momentary Assessment, or intervene, via Ecological Momentary Interventions (EMIs), individuals in their natural environment, thereby enabling a better understanding of the factors triggering problems and addressing the problems when and where they arise [18][19][20]. These methods overcome memory biases by asking questions pertaining to the current moment or the current day and can also help determine if phenomena are stable or change from day to day [21]. Furthermore, such ecologically valid data may help to guide treatments or improve assessments in naturalistic settings [22].
Given the heterogeneity and speed of publication of app-related studies, aggregated results are needed to determine the overall (vs specific) efficacy of mobile apps for mental health. Multiple meta-analyses on apps focusing on a single or multiple mental health problems have been conducted [39,40], with very different results at times. This could be explained by the selection criteria for the meta-analyses, with some only focusing on stand-alone apps, others only looking at adjunctive apps (apps offered on top of another treatment) or apps offered with guidance (a person available for questions or to prompt its use), others considering both models together, and others still including everything and evaluating the models separately in different subanalyses. In fact, some authors suggest that only adjunctive apps or apps with guidance should be recommended at this point for mental health issues [41]. Given the speed of uptake of many of these apps, it is important to determine, based on the quality of the evidence available and the effect sizes, if we should recommend such apps for mental health problems such as depression or anxiety. The purpose of this meta-review was to summarize these results and determine the empirical quality of the evidence reported using the grading of recommendations, assessment, development, and evaluation (GRADE) system [42]. This system permits the quality of evidence produced by meta-analyses to be evaluated, according to specific factors: the sample size, the stable findings across studies, the appropriate control for known confounding factors, no evidence of study bias, follow-up (if any), and results being closely linked to the outcomes targeted here (see Tables 1 and  2). The GRADE system has been successfully applied to meta-analyses of pre-post designs, randomized controlled trials (RCTs), correlational studies, experimental studies, and longitudinal studies [42].

Literature Search
We only included systematic reviews reporting quantitative pooled data (ie, meta-analyses), published in full text, in English or French, and those mentioned the use of app technology for mental health issues.
When more than one meta-analysis was found for a mental health problem, we reviewed them all and used the following criteria to select the ones we kept: (1) if most of the same studies were reviewed, we kept the meta-analysis with the largest number of studies; and (2) between an older meta-analysis with many small uncontrolled studies and a more recent meta-analysis including only RCTs, we chose the latter. We excluded systematic reviews without quantifiable data (eg, qualitative) and treatment guidelines. The final decision to include or exclude reviews was made by consensus by 2 researchers (TL and SP).

Search Strategy
MEDLINE, EMBASE, Current Contents, PsycINFO, and Google Scholar were searched. Keywords included mental health, technology, app, mHealth, eHealth, mobile, with the added filters: review or meta. See Figure 1 for the selection of studies.

Grading of Recommendations, Assessment, Development, and Evaluation System
The GRADE system was used to assess evidence quality [42]. According to this assessment system, the quality of evidence of meta-analyses can be judged based on various factors, namely, the size of the sample (the larger the better, ideally over 1000), the precision of effects (ie, the CI is not too wide; we opted for within 25% higher or lower than the effect size as ideal), the directness of the outcomes (eg, impact on mental health symptoms [direct] vs impact on perceived stress [indirect]), homogeneity of effects across studies (ie, consistency of results from one study to the next), the study design (prospective studies or RCTs obtain higher scores than cross-sectional or retrospective studies), follow-up data (if any and the length of time), and publication bias (if analyzed and presented). We also added a specific section for the confounding factors considered, which can include controlling for biases, trial quality, and other variables that could influence the results. The magnitude of the impact of the app is determined based on the estimated effect size (the larger the value, the better) [43]. We chose to present effect sizes apart from the quality of evidence for each study. As such, a point is given for each element of the GRADE system measured, with meta-analyses being rated as either very poor, poor, poor to moderate, moderate, moderate to high, high, or very high-quality evidence. No points are given for the effect size. For each meta-analysis, 2 expert raters (TL and SP) rated the different components with the GRADE system. Both raters met and went over their ratings for a final consensus. Given the stringent criteria involved, consensus was easily reached (over 95% initial agreement). For each component of the model, we present both the quality of the evidence and the effect size. Given that meta-analyses also conducted subanalyses, we reported those that compared the apps with a control condition and indicated the results for stand-alone apps versus apps with guidance.

Included Studies
Overall, our search retrieved 2558 potential papers. After excluding irrelevant papers and articles that did not respond to our inclusion criteria, we retrieved 24 meta-analyses that were reviewed, of which 7 were included in the meta-review. Please refer to Figure 1 for the flow diagram of the inclusion of meta-analyses in the meta-review.

Mental Health (Multiple Problems)
Two meta-analyses [44,45] included apps targeting multiple mental health problems, ranging from anxiety, depression, to substance misuse, and even included some studies on physical health problems or stress. The Lindheim et al's study [44] specifically targeted whether apps offered additional benefits to ongoing treatments or psychotherapy. As such, they only included studies that used apps in addition to a regular (in person) delivered intervention. Overall, the effect size was medium, suggesting that apps can add value to existing treatments. However, the quality of the evidence was rated as poor to moderate (see Tables 1 and 2), given that the effects were imprecise, the samples were very heterogeneous, and the effects were indirect (no subanalyses by diagnosis or problem and all mixed together). However, the meta-analysis included only RCTs, with a total sample size slightly below the criterion of 1000 and verified publication bias. The meta-analysis by Versluis et al [45] was interested in EMI as a tool to increase self-management to cope with depression, anxiety, or stress. As can be seen in Tables 1 and 2, they calculated the effect size in general (all mental health problems together) as well as according to specific outcomes. For the results as a whole, the effect size was medium, the sample size was significant (above 1000), and the publication bias verified, but the other elements did not support quality evidence, with a rating of poor to moderate.
Another meta-analysis also targeted mental health as a larger construct, but within the workplace. This meta-analysis, from the study by Stratton et al [46], however, included various electronic health strategies, with only 3 studies specifically offering an app. The results of these studies are aggregated with other results. As a consequence, the results are indirect, highly heterogeneous, and imprecise with a total quality score of moderate. The effect size was small and decreased when publication bias was included. Nonetheless, the study included a large sample (more than 2000 participants) and only looked at RCTs.

Anxiety
Versluis et al [45], Firth et al [39], and Linardon et al [47] specifically measured the effect sizes of apps for anxiety symptoms. Although Linardon et al's [47] meta-analysis is more recent and includes many more studies than the other two meta-analyses, it does not include all the studies found in the two previous studies but has many others, justifying the need to keep all 3 meta-analyses in this review. In a study by Versluis et al [45], a medium effect size was found for EMI on self-management of anxiety symptoms, with poor-quality evidence (because of sample size, heterogeneity of samples, imprecise effect, and no follow-up). The study did, however, look at publication biases and included both RCTs and prospective studies. Firth et al [39], on the other hand, included only RCTs and compared apps with waitlist or active controls and found a small to medium effect size overall when compared with waitlist and small effect when compared with active controls. The meta-analysis included homogeneous samples, samples more than 1000 participants and, overall, were rated of moderate to high quality (but high-quality evidence for the comparison with active controls and waitlists). Finally, Linardon et al [47], focusing on generalized anxiety disorder symptoms, also only included RCTs and considered publication bias (which increased the effect size), and included various controls (waitlist and different types of control conditions: information, placebo/attention, and active controls) and found a small to medium effect size (for all controls together). A closer examination revealed that the effect size decreased as the control condition became more stringent, with the effect no longer being significant when an active treatment control was used. They also looked at some follow-up data and found that the effect size remained small for follow-ups of 2 to 6 weeks. However, those (15 studies) that included follow-ups at 7 to 11 weeks found a medium effect size (g=0.52; 95% CI 0.41 to 0.63). This meta-analysis also considered various subgroup analyses (type of app, intervention model, and specific techniques), but these did not seem to modify the outcome. Overall, we rated this meta-analysis as being of high quality (overall) and moderate quality when compared with active controls because of the strengths mentioned and the fact that the results were either imprecise or inconsistent (or small N for active controls).

Specific Anxiety Symptoms
Linardon et al [47] also looked at specific anxiety symptoms, namely, social anxiety, panic, and posttraumatic stress symptoms. Only apps focusing on social anxiety (6 studies) reported a significant medium effect size, with quality evidence of poor to moderate quality (see Tables 1 and 2). Panic and posttraumatic stress symptoms did not improve in the studies reviewed (3 and 4 studies, respectively), with the evidence rated as poor to moderate quality.

Depression
In total, 3 meta-analyses measured the impact of apps on symptoms of depression and 1 looked at apps for suicidal ideation and self-harm. As was mentioned for anxiety disorders, Linardon et al's [47] meta-analysis is the most recent but does not include all the studies reviewed in either Versluis et al's [45] or Firth et al's [40] meta-analyses, justifying the need to keep all 3 in this meta-review. Versluis et al [45], looking at EMI for self-management of depressive symptoms, found a small to medium effect size, but the quality of the evidence was judged as poor, given the heterogeneity of the samples, the imprecise effect, the study design (no RCTs), the sample size, and the absence of follow-up. The effect was direct, and publication biases were considered. Firth et al [40] compared smartphone interventions with active and inactive controls and only included RCTs. The overall quality of this study was rated as high, with small to medium effect size overall, medium effect size with inactive controls, and small for active controls. Apart from the inconsistency (heterogeneity) and absence of follow-ups, all other quality criteria were met. As for Linardon et al [47], the effect size was small to medium, the effect precise, all studies included were RCTs, a large sample, with no negative effect of publication bias (in fact an increase was noted). Furthermore, follow-ups were reported for some studies, indicating that the effect size was small at posttreatment and at 2 to 6 weeks, but medium at 7 to 11 weeks follow-up (g=0.46, 95% CI 0.36-0.55). The quality of the evidence was also rated as high (overall), given that heterogeneity was found.
Witt et al [48] conducted a meta-analysis on the use of apps for the self-management of suicidal ideation and self-harm. The apps included were solely stand alone. They conducted analyses of suicidal ideation scores, suicidal behaviors, and self-harm behaviors. As can be seen in Tables 1 and 2, when only including RCTs for suicidal ideation, the effect size was small, imprecise, but the sample was homogeneous, followed up with a similar effect size, and the effect was direct. The quality of the evidence was rated as moderate, given the small sample size and the lack of control for biases (publication or otherwise). When looking at noncontrolled studies for suicidal ideation, the quality of the evidence drops to very poor, with small sample size, high heterogeneity, and imprecise effect. As for self-harm, the analyses were mean differences in the frequency of behavior, with nonsignificant effect and poor to moderate-quality evidence.

Other Mental Health Concepts
Versluis et al [45] also measured the impact of EMI apps on perceived stress, quality of life, acceptance, and relaxation. We chose to only consider perceived stress and quality of life, given that the latter two are theory or intervention specific. Both had small to medium effect sizes with poor quality for perceived stress and poor to moderate-quality evidence for the quality of life (only precision and sample size offered a point). Linardon et al [47] also included indirect measures, namely, distress, stress, and quality of life. They reported a small-medium effect size, but overall, moderate-quality evidence for distress. For stress, the effect size was small to medium but with moderate to high-quality evidence (thanks to various biases controlled for, follow-up data, large sample, and including only RCTs).
As for the quality of life, the effect was also small to medium, but the quality of the evidence was high (thanks to precise, consistent effect, large sample, follow-up, and biases controlled for).
Regarding stand-alone apps versus apps offered with guidance or adjunctive to therapy, only some meta-analyses actually compared these, whereas other meta-analyses looked at only one condition. As such, Lindheim et al's meta-analysis [44] only included adjunctive and had a medium effect size, with poor to moderate quality. Witt et al [48] only included stand-alone apps and found a small effect size, with moderate-quality evidence. Versluis [45] found a medium to large effect size when guidance was offered, compared with medium effect size for stand-alone apps, with stand-alone apps being supported by poor evidence compared with poor to moderate evidence for guidance. Linardon et al's meta-analysis [47] broke down the guidance versus stand-alone apps according to symptoms targeted (ie, anxiety or depression). For anxiety, the effect size is medium for guidance, compared with small for stand-alone apps, with the quality of the evidence being moderate to high for stand-alone and moderate for apps with guidance. For depression, the effect size was medium for apps with guidance versus small for stand-alone apps, with the quality of the evidence being moderate to high for stand-alone apps and moderate for guidance.

Principal Findings
This meta-review allowed us to closely examine the quality of the evidence reported by 7 meta-analyses (including various subanalyses) on the use of apps for mental health issues. The results are equivocal, with 14 results being linked to poor or poor to moderate, 15 to moderate or moderate to high, and 8 to high-quality evidence.
When examining studies that include various types of apps for mental health (general), we find that the conclusions are not solid with poor to moderate or moderate-quality evidence, although medium effects (or small effects when looking at work) are reported. For higher quality evidence, samples need to be larger, more homogeneous, with biases and follow-ups included. Although it might be tempting to conduct these larger analyses by merging various apps focusing on different mental health issues, they might not convey quality evidence that is useful.

Specific Findings
Apps for anxiety symptoms appear to bring a clear benefit of small to medium amplitude, but with good-quality evidence. There are some discrepancies in the results reported, with Firth et al [39] seeing a small effect size when apps were compared with active controls, but in a study by Linardon et al [47], a significant effect was not observed when active controls were used for comparison. This might be because of the inclusion criteria used in these meta-analyses (generalized anxiety symptoms vs anxiety symptoms) or to the much larger sample included in Firth's analysis. Although follow-ups have only been conducted in a limited number of studies, these report sustained benefits at 6 to 11 weeks. Given that we do not know the frequency of people actually using the apps in the studies (daily, weekly, or less), these results are very promising. The results for specific anxiety problems are of lower quality evidence and did not report a significant clinical effect (for posttraumatic stress disorder or panic disorder), except for social anxiety disorder, which is supported by moderate-quality evidence and a medium effect size. Of import, very few studies focused on apps for specific anxiety disorders.
When looking at apps focusing on depressive symptoms, we obtained small to medium effect sizes compared with waitlist, small when compared with active controls, with overall good-quality evidence (especially for more recent meta-analyses). Furthermore, studies reporting follow-ups show maintenance of the effect at 7 to 11 weeks. These results support the use of apps for depression. The quality of the evidence at this time moderately supports apps for suicidal ideation, with a small effect size but does not support apps for self-harm (no effect).
As for indirect mental health outcomes, namely, outcomes that were considered but were not the main focus of the app intervention (such as distress, stress, or quality of life), the effects are consistently small to medium, with greater quality evidence for the most recent meta-analysis [47].

Limitations
Our results are limited by its focus on mental health. As such, we did not consider apps that focused on a specific intervention or model (eg, mindfulness apps or CBT) and that did not include symptoms as an outcome. Our results also need to consider what we were not able to measure. Although we sought meta-analyses pertaining to apps in mental health, we did not find meta-analyses for multiple domains or mental health problems for which apps have been developed (eg, severe mental illness, addiction, eating disorders, and obsessive-compulsive disorder). Furthermore, few of the reviews considered confounding factors, such as the actual frequency or time of exposure to the app. Although Weisel et al [41] do not recommend stand-alone apps for mental health problems, our results are more nuanced. Indeed, effect sizes tend to be higher for apps that are used with guidance or with an ongoing treatment (medium effect) compared with stand-alone (small effect), but the quality evidence is better for stand-alone apps. This suggests that stand-alone apps mostly offer a small improvement, but this improvement is consistent across quality studies. As such, apps could be used as a stand-alone treatment, while being on a waitlist for an active treatment, for instance, and offer a small effect on symptoms or offered with some guidance or alongside an ongoing in-person treatment for a medium effect on symptoms.
Furthermore, various studies did not use similar control conditions. The use of different types of control group usually leads to variations in effect estimates. The effect sizes of interventions are typically lower when compared with active controls instead of inactive controls [49,50]. We cannot exclude a digital-placebo effect related to the use of the device itself or from the expectations' effect [51] rather than from possible active components [52]. Several recent protocols include a placebo intervention (a sham version of the app) [53]; unfortunately, only some of the studies assessed in the included meta-review involved such placebo app control. Furthermore, several studies were conducted with nonclinical populations, who presented with symptoms but perhaps not a diagnosed disorder, limiting the generalizability of the results for clinical populations [54].
Nonetheless, the nature of smartphone interventions does appear to position them as a possible low-intensity intervention tool for those with less severe levels of symptoms or as a first step in a stepped-care approach to service delivery [55]. The follow-up data available to date also suggest that gains are sustainable over a few months. Additional follow-up data are warranted to confirm these results.
Attrition is another problem repeatedly described in smartphone app-related studies [56] and in naturalistic use [57]. Further studies should include a detailed description of the behavior change techniques involved in the design [58] as well as data on the actual utilization of the different app functions. It will be helpful to increase our knowledge about effective strategies in behavior change as well as about the app use engagement. It would also be useful to have a better understanding of the context in which the app is used, at home, at work, at the clinic in the waiting room, alone, or with a therapist or a family member.

Conclusions
We believe that future studies should focus on high users of apps, namely, youth and young adults. We currently do not have specific information on the efficacy and actual use of mental health apps with such subgroups of individuals. To date, most of the app studies on mental health have focused on feasibility and acceptability, with only a small portion actually pushing forward toward efficacy trials (and often with small numbers). The field of apps for mental health is burgeoning, with the speed of delivery of the app being a primary concern. Traditional study designs (such as RCTs) tend to take a long duration to complete and can deter app developers who aim to commercialize their product. Other controlled research designs could be encouraged (eg, repeated single-case experimental designs) to encourage quality studies at a more rapid speed.
In conclusion, apps for anxiety and depression hold great promise with clear clinical advantages, modestly as stand-alone self-management, and more strongly with guidance or adjunctive treatments. More meta-analyses and more quality studies are needed to recommend apps for other mental health issues or for specific populations.