This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR mHealth and uHealth, is properly cited. The complete bibliographic information, a link to the original publication on http://mhealth.jmir.org/, as well as this copyright and license information must be included.
As the development of mobile health apps continues to accelerate, the need to implement a framework that can standardize the categorization of these apps to allow for efficient yet robust regulation is growing. However, regulators and researchers are faced with numerous challenges, as apps have a wide variety of features, constant updates, and fluid use cases for consumers. As past regulatory efforts have failed to match the rapid innovation of these apps, the US Food and Drug Administration (FDA) has proposed that the Software Precertification (Pre-Cert) Program and a new risk-based framework could be the solution.
This study aims to determine whether the risk-based framework proposed by the FDA’s Pre-Cert Program could standardize categorization of top health apps in the United States.
In this quality improvement study during summer 2019, the top 10 apps for 6 disease conditions (addiction, anxiety, depression, diabetes, high blood pressure, and schizophrenia) in Apple iTunes and Android Google Play Store in the United States were classified using the FDA’s risk-based framework. Data on the presence of well-defined app features, user engagement methods, popularity metrics, medical claims, and scientific backing were collected.
The FDA’s risk-based framework classifies an app’s risk by the disease condition it targets and what information that app provides. Of the 120 apps tested, 95 apps were categorized as targeting a nonserious health condition, whereas only 7 were categorized as targeting a serious condition and 18 were categorized as targeting a critical condition. As the majority of apps targeted a nonserious condition, their risk categorization was largely determined by the information they provided. The apps that were assessed as not requiring FDA review were more likely to be associated with the integration of external devices than those assessed as requiring FDA review (15/58, 26% vs 5/62, 8%;
The FDA’s risk-based framework has the potential to improve the efficiency of the regulatory review process for health apps. However, we were unable to identify a standard measure that differentiated apps requiring regulatory review from those that would not. Apps exempt from the review also carried concerns regarding privacy and data security. Before the framework is used to assess the need for a formal review of digital health tools, further research and regulatory guidance are needed to ensure that the Pre-Cert Program operates in the greatest interest of public health.
The development of mobile health apps has been increasing in recent years; recent estimates found that approximately 325,000 mobile health apps are available in the marketplace [
Despite a lack of evidence and in the absence of direct regulation, smartphone ownership and interest in health apps remain to be high among patients [
In the past, the US Food and Drug Administration (FDA) focused its regulatory efforts on a small subset of mobile medical apps: those that provided treatment or diagnosis to users and those that were an extension of or transformed into regulated medical devices [
As a result, in June 2018, the FDA published a working model for its Software Precertification (Pre-Cert) Pilot Program and released a
Under the Pre-Cert Program, FDA regulators plan to first evaluate digital health app developers and not the apps themselves [
Under Pre-Cert, FDA regulators would first examine companies through an
Precertification status determination process. This figure is an overview of how the FDA will determine the precertification status of different organizations. FDA: Food and Drug Administration.
After a developer’s Pre-Cert status is granted, the type of review (if any review is necessary) that each new software product will undergo would be determined by its risk profile. In addition to the developer’s Pre-Cert status, each new software must complete a risk analysis, and together, these designations will determine if a review is necessary. Using a risk-based framework developed by the International Medical Device Regulators Forum (IMDRF) SaMD Working Group, software developers will perform this risk analysis and determine an SaMD’s risk by considering the severity of the medical condition it targets and the type of information the app offers [
The Pre-Cert Program hopes to streamline the FDA’s review process by incorporating FDA oversight during the development of precertified organizations’ apps and not just when the app is finalized. The FDA also hopes to minimize the burden on developers to prove their product’s efficacy and safety, but the list of reduced requirements has yet to be finalized [
The FDA’s effort to modernize its regulatory framework is not unique, as multiple guidelines attempting to clarify and streamline government regulations of digital health tools have been implemented in both the United States and Europe. In conjunction with the FDA and other departments, the Federal Trade Commission has developed a web-based survey helping app developers identify what federal regulations pertain to their app [
In addition, the clinical community has already begun the process of evaluating apps (including highlighting concerns around both patient privacy and app efficacy) [
We based our analysis on the methods outlined in a study by Wisniewski et al [
We translated both the disease condition and significance of information categories to a numerical scale, allowing for easier data analysis: apps that were deemed to target a nonserious condition were rated as 0, whereas serious and critical conditions were given a 1 and 2, respectively. If an app targeted several diagnoses, it was categorized by the most severe disease condition described. Regarding the information provided, informing clinical care was rated an “A,” whereas apps that drove clinical management or treated and diagnosed users were given a “B” and “C,” respectively. Using the FDA’s current guidelines, we coded apps as informing clinical care if they simply provided information. Any personal data entry that was used to monitor symptoms was coded as driving clinical management, whereas apps providing treatment and diagnosis were differentiated from other functionalities. An app’s review required status, the classification that decides whether an FDA review would be required under the Pre-Cert Program, was determined by the combination of both criteria and given the numerical value that the IMDRF working group had previously attributed to each category, ranging from 1 to 4 (I, II, III, and IV). For example, a meditation app claiming to alleviate anxiety and stress would have been coded as targeting a nonserious condition (0) and providing treatment (C). Under the Pre-Cert Program, this app would be given a review level of II, which requires
Risk categorization rating system. This figure shows how the Pre-Cert Program uses the disease condition an app targets (0-2) and what information that app provides (A-C) to determine what review that app must undergo (I-IV).
Following coding and data reconciliation, apps were dichotomized into exemption from a review or requiring a review. Apps given an IMDRF categorization of “I” would be exempt from any regulatory review, whereas type II, III, and IV apps would undergo some form of review depending on the precertification status of the organization. As types II, III, and IV apps would undergo some form of review, we grouped these apps together. The data were further stratified by categorical measures, such as which disease condition they targeted. Two-sided
Of the 120 total Apple and Android apps examined in the simulation, 95 (79.2%) were categorized as targeting a nonserious health condition, whereas only 7 (5.8%) apps targeted a serious condition, including one app that targeted addiction, 5 that targeted depression, and one that targeted anxiety. The remaining 18 (15.0%) apps targeted a critical condition; however, all apps in this group targeted schizophrenia.
Review required status—that is, the classification that determines if an FDA review would be required under the Pre-Cert Program and represented by code I, II, III, or IV (
When comparing the reliability between Apple and Android apps, no statistically significant differences were found between whether or not review was required for each disease condition between the platforms (addiction: 1.25 vs 1.22,
The number of apps and their features were stratified by whether a review was required and are shown in
Two-sided
The mean values and SDs of proxies for app popularity (star ratings, number of reviews, and days since the last update) and data on days since each app’s last update are summarized in
App features by review required. The orange bars represent apps that would undergo a regulatory review in the Pre-Cert Program (review levels II, III, and IV), and the blue bars represent apps exempt from review (review level I).
Popularity metrics and update history by review required.
Review required | No review required (n=58) | Review required (n=62) |
User star ratings, mean (SD) | 4.48 (0.6) | 4.13 (1) |
Number of ratings, mean (SD) | 14,554 (42,409) | 14,018 (70,454) |
Days since last update, mean (SD) | 189 (335) | 264 (338) |
When the data were stratified by targeted disease, the number of apps requiring review within each condition varied dramatically. In total, 16 apps targeting addiction required no review (I), while only 4 apps would undergo a review (II, III, and IV). A similar trend is seen in apps targeting high blood pressure as 15 of these apps were determined to be exempt from a review (I), leaving only 5 apps to undergo a review (II, III, and IV). Notably, all apps targeting diabetes were deemed to be exempt from any review (I), whereas anxiety, depression, and schizophrenia apps comprised the majority of those definitely or possibly requiring review (II, III, and IV). This result was driven by the finding that most anxiety and depression apps offered treatment and schizophrenia was classified as a critical disease condition, which resulted in a higher likelihood that review would be required.
When stratified by disease, the samples become underpowered because of the small number of apps in each subgroup. However, it is worth noting that some of the differences noted earlier remain. For example, only 6% (1/16) of addiction apps exempt from review offered an intervention, whereas 100% (4/4) of addiction apps requiring a review did. The same trend holds true for apps targeting anxiety (0/3, 0% vs 16/17, 94%) and depression (1/6, 17% vs 11/14, 79%). In addition, none of the apps targeting depression that were exempt from review offered external information, whereas 79% (11/14) of those requiring reviews did. Notable results are summarized in
Apps’ features for addiction, anxiety, and depression by review required, stratified by targeted disease.
App features | Addiction apps exempt from review (n=16) | Addiction apps requiring reviews (n=4) | Anxiety apps exempt from review (n=3) | Anxiety apps requiring reviews (n=17) | Depression apps exempt from review (n=6) | Depression apps requiring reviews (n=14) |
Device integration, n (%) | 0 (0) | 0 (0) | 0 (0) | 2 (12) | 1 (17) | 0 (0) |
Steps or health information, n (%) | 0 (0) | 0 (0) | 1 (33) | 3 (18) | 1 (17) | 3 (21) |
Offer information, n (%) | 2 (13) | 4 (100) | 2 (67) | 12 (71) | 0 (0) | 11 (79) |
Connect to professional care, n (%) | 5 (31) | 1 (25) | 1 (33) | 6 (35) | 0 (0) | 6 (43) |
In-app interventions, n (%) | 1 (6) | 4 (100) | 0 (0) | 16 (94) | 1 (17) | 11 (79) |
User star ratings, mean (SD) | 4.7 (0.18) | 4.68 (0.17) | 4.5 (0.52) | 4.65 (0.19) | 4.55 (0.23) | 4.2 (0.56) |
Apps’ features for diabetes, high blood pressure, and schizophrenia.
App features | Diabetes apps exempt from review (n=20) | High blood pressure apps exempt from review (n=15) | High blood pressure apps requiring review (n=5) | Schizophrenia apps exempt from review (n=4) | Schizophrenia apps requiring review (n=16) |
Device integration, n (%) | 11 (55) | 3 (20) | 0 (0) | 0 (0) | 3 (19) |
Steps or health information, n (%) | 8 (40) | 1 (7) | 0 (0) | 0 (0) | 3 (19) |
Offer information, n (%) | 14 (70) | 5 (33) | 1 (20) | 2 (50) | 15 (94) |
Connect to professional care, n (%) | 1 (5) | 0 (0) | 0 (0) | 0 (0) | 1 (6) |
In-app interventions, n (%) | 6 (30) | 0 (0) | 1 (20) | 0 (0) | 1 (6) |
User star ratings, mean (SD) | 4.48 (0.47) | 4.13 (0.94) | 3.28 (1.19) | 4.18 (0.57) | 2.5 (1.9) |
After coding for the presence of observable features of top health apps, we found attributes that differentiated the apps that would likely undergo an FDA regulatory review under the Pre-Cert Program versus those that would not. Apps offering interventions were most likely to require a review (II, III, and IV), whereas monitoring apps were more likely to be streamlined. In addition, apps requiring FDA review were more likely to offer references and connect users to professional care than streamlined apps. This distinction between formal medical advice and user-led data embedded in the FDA’s risk categorization demonstrates a promising foundation for the framework. Apps gearing themselves toward providing more formal care, such as interventions or references, have the potential to elicit greater harm than monitoring apps if these features are erroneous. Consumers are using these apps for treatment or diagnosis and are being exposed to the information provided by these health apps. Monitoring apps largely rely on data provided by the user rather than on the supply of novel information. The Pre-Cert’s risk categorization’s ability to differentiate between apps relying on formal medical advice versus user-led data and require a review from the former indicates its potential to catch apps that pose a greater risk.
When the data were stratified by targeted disease, the sample became underpowered, and we were unable to perform significance testing. However, although the small size of each subgroup is a limitation in this study, the observed trends offer valuable and novel insight into how the Pre-Cert Program’s categorization should be refined before full implementation. For example, in the subgroup analysis, the percentage of apps offering an intervention differed dramatically between those exempt from review and those requiring a review for apps targeting addiction, anxiety, and depression. This finding hints that the presence of an intervention is one of the strongest associations for apps requiring a review. However, this metric is challenging to reliably differentiate from apps that simply monitor symptoms. In particular, in the mental health setting, it has been established that individuals who monitor their symptoms feel better [
A specific criterion that the FDA should set more explicit guidelines around is the disclosure of apps’ data policies. At present, the framework does not reflect if or how an app discloses how users can delete their information. For example, only 64% (37/58) of apps evaluated as likely to be exempt from review provided information about data deletion, although
The Pre-Cert Program and its risk categorization are still in their early developmental stages, as the FDA continues to test myriad aspects of the program. In an update summarizing testing performed through May 2019, the FDA described their refinement of this review determination process and admitted that further insight from patients and the digital health community is needed [
Our results must be interpreted in light of several limitations. First, we examined only 120 apps out of thousands that are currently marketed. As we took a convenience sample, there is the possibility that this sample does not reflect the top 10 apps presented to every consumer upon their search. In addition, at present, it is unclear which apps will need to be regulated with Pre-Cert and which will voluntarily partake—although as many health-related apps at present make clinical claims, many likely would fall under the scope of regulation. Second, our ratings were obtained by 2 reviewers and checked by a third reviewer. No third-party standards currently exist for determining how to score apps and to maintain validity, although we used published evaluation standards from previous research. Our research team only coded those features that could be verified, meaning that more subjective aspects of software products, such as app usability, were not coded. Third, we recognize that apps targeting certain disease conditions will have some inherent features and classifications. For example, apps targeting diabetes are more likely to integrate external devices than apps targeting other disease conditions because of their connection to blood glucose monitors. We attempted to minimize this effect by having a large sample of apps across a diverse range of conditions. In addition, the Pre-Cert’s risk categorization will be used to classify these mobile health apps and will also need to account for these inherent features. Finally, we acknowledge that this risk-based framework is still in its testing phase and that revisions and additions will be made that are likely to increase the clarity of the criteria. Indeed, we expect this and applaud ongoing FDA efforts to pilot its framework and invite feedback from user communities.
In the current state of digital health, the need to provide these more explicit guidelines is clear. There remains a lack of standards-based and reliable regulatory frameworks and evaluations that assess app quality, which, in turn, diminish consumers’ confidence in digital health [
The Pre-Cert Program’s risk-based framework for assessing digital health apps and other SaMD products offers a promising foundation for enforcing appropriate digital health regulation while facilitating innovation and the use of technological advancements. However, the limited differences in our sample between apps likely requiring regulatory review and those that likely do not suggest that more detailed criteria are needed. We believe that additional exercises such as those done in this study, which can shed light on how the framework is likely to play out in the context of real-world digital health products, will be of high value. On the basis of such research, regulatory guidelines could be clarified and specified before the framework is deployed in the complex and dynamic landscape of digital health.
Apps reviewed.
Food and Drug Administration
International Medical Device Regulators Forum
Software Precertification
Software as a Medical Device
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
JT conceived the study. NA and student volunteers collected the data, and NA conducted the analysis, which was reviewed by JT. NA drafted the manuscript, and all authors contributed significantly to editing and the final version.
JT receives unrelated research support from Otsuka. Other authors declare no conflicts of interest.