@Article{info:doi/10.2196/63105, author="Templeton, Michael John and Poellabauer, Christian and Schneider, Sandra and Rahimi, Morteza and Braimoh, Taofeek and Tadamarry, Fhaheem and Margolesky, Jason and Burke, Shanna and Al Masry, Zeina", title="Modernizing the Staging of Parkinson Disease Using Digital Health Technology", journal="J Med Internet Res", year="2025", month="Apr", day="4", volume="27", pages="e63105", keywords="digital health", keywords="Parkinson disease", keywords="disease classification", keywords="wearables", keywords="personalized medicine", keywords="neurocognition", keywords="artificial intelligence", keywords="AI", doi="10.2196/63105", url="https://www.jmir.org/2025/1/e63105" } @Article{info:doi/10.2196/59094, author="Dawadi, Research and Inoue, Mai and Tay, Ting Jie and Martin-Morales, Agustin and Vu, Thien and Araki, Michihiro", title="Disease Prediction Using Machine Learning on Smartphone-Based Eye, Skin, and Voice Data: Scoping Review", journal="JMIR AI", year="2025", month="Mar", day="25", volume="4", pages="e59094", keywords="literature review", keywords="machine learning", keywords="smartphone", keywords="health diagnosis", abstract="Background: The application of machine learning methods to data generated by ubiquitous devices like smartphones presents an opportunity to enhance the quality of health care and diagnostics. Smartphones are ideal for gathering data easily, providing quick feedback on diagnoses, and proposing interventions for health improvement. Objective: We reviewed the existing literature to gather studies that have used machine learning models with smartphone-derived data for the prediction and diagnosis of health anomalies. We divided the studies into those that used machine learning models by conducting experiments to retrieve data and predict diseases, and those that used machine learning models on publicly available databases. The details of databases, experiments, and machine learning models are intended to help researchers working in the fields of machine learning and artificial intelligence in the health care domain. Researchers can use the information to design their experiments or determine the databases they could analyze. Methods: A comprehensive search of the PubMed and IEEE Xplore databases was conducted, and an in-house keyword screening method was used to filter the articles based on the content of their titles and abstracts. Subsequently, studies related to the 3 areas of voice, skin, and eye were selected and analyzed based on how data for machine learning models were extracted (ie, the use of publicly available databases or through experiments). The machine learning methods used in each study were also noted. Results: A total of 49 studies were identified as being relevant to the topic of interest, and among these studies, there were 31 different databases and 24 different machine learning methods. Conclusions: The results provide a better understanding of how smartphone data are collected for predicting different diseases and what kinds of machine learning methods are used on these data. Similarly, publicly available databases having smartphone-based data that can be used for the diagnosis of various diseases have been presented. Our screening method could be used or improved in future studies, and our findings could be used as a reference to conduct similar studies, experiments, or statistical analyses. ", doi="10.2196/59094", url="https://ai.jmir.org/2025/1/e59094" } @Article{info:doi/10.2196/62851, author="Fu, Yao and Huang, Zongyao and Deng, Xudong and Xu, Linna and Liu, Yang and Zhang, Mingxing and Liu, Jinyi and Huang, Bin", title="Artificial Intelligence in Lymphoma Histopathology: Systematic Review", journal="J Med Internet Res", year="2025", month="Feb", day="14", volume="27", pages="e62851", keywords="lymphoma", keywords="artificial intelligence", keywords="bias", keywords="histopathology", keywords="tumor", keywords="hematological", keywords="lymphatic disease", keywords="public health", keywords="pathologists", keywords="pathology", keywords="immunohistochemistry", keywords="diagnosis", keywords="prognosis", abstract="Background: Artificial intelligence (AI) shows considerable promise in the areas of lymphoma diagnosis, prognosis, and gene prediction. However, a comprehensive assessment of potential biases and the clinical utility of AI models is still needed. Objective: Our goal was to evaluate the biases of published studies using AI models for lymphoma histopathology and assess the clinical utility of comprehensive AI models for diagnosis or prognosis. Methods: This study adhered to the Systematic Review Reporting Standards. A comprehensive literature search was conducted across PubMed, Cochrane Library, and Web of Science from their inception until August 30, 2024. The search criteria included the use of AI for prognosis involving human lymphoma tissue pathology images, diagnosis, gene mutation prediction, etc. The risk of bias was evaluated using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Information for each AI model was systematically tabulated, and summary statistics were reported. The study is registered with PROSPERO (CRD42024537394) and follows the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 reporting guidelines. Results: The search identified 3565 records, with 41 articles ultimately meeting the inclusion criteria. A total of 41 AI models were included in the analysis, comprising 17 diagnostic models, 10 prognostic models, 2 models for detecting ectopic gene expression, and 12 additional models related to diagnosis. All studies exhibited a high or unclear risk of bias, primarily due to limited analysis and incomplete reporting of participant recruitment. Most high-risk models (10/41) predominantly assigned high-risk classifications to participants. Almost all the articles presented an unclear risk of bias in at least one domain, with the most frequent being participant selection (16/41) and statistical analysis (37/41). The primary reasons for this were insufficient analysis of participant recruitment and a lack of interpretability in outcome analyses. In the diagnostic models, the most frequently studied lymphoma subtypes were diffuse large B-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia, and mantle cell lymphoma, while in the prognostic models, the most common subtypes were diffuse large B-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia, and Hodgkin lymphoma. In the internal validation results of all models, the area under the receiver operating characteristic curve (AUC) ranged from 0.75 to 0.99 and accuracy ranged from 68.3\% to 100\%. In models with external validation results, the AUC ranged from 0.93 to 0.99. Conclusions: From a methodological perspective, all models exhibited biases. The enhancement of the accuracy of AI models and the acceleration of their clinical translation hinge on several critical aspects. These include the comprehensive reporting of data sources, the diversity of datasets, the study design, the transparency and interpretability of AI models, the use of cross-validation and external validation, and adherence to regulatory guidance and standardized processes in the field of medical AI. ", doi="10.2196/62851", url="https://www.jmir.org/2025/1/e62851" } @Article{info:doi/10.2196/57298, author="Kim, Taehwan and Choi, Jung-Yeon and Ko, Jin Myung and Kim, Kwang-il", title="Development and Validation of a Machine Learning Method Using Vocal Biomarkers for Identifying Frailty in Community-Dwelling Older Adults: Cross-Sectional Study", journal="JMIR Med Inform", year="2025", month="Jan", day="16", volume="13", pages="e57298", keywords="frailty", keywords="cross-sectional study", keywords="vocal biomarkers", keywords="older adults", keywords="artificial intelligence", keywords="machine learning", keywords="classification model", keywords="self-supervised", abstract="Background: The two most commonly used methods to identify frailty are the frailty phenotype and the frailty index. However, both methods have limitations in clinical application. In addition, methods for measuring frailty have not yet been standardized. Objective: We aimed to develop and validate a classification model for predicting frailty status using vocal biomarkers in community-dwelling older adults, based on voice recordings obtained from the picture description task (PDT). Methods: We recruited 127 participants aged 50 years and older and collected clinical information through a short form of the Comprehensive Geriatric Assessment scale. Voice recordings were collected with a tablet device during the Korean version of the PDT, and we preprocessed audio data to remove background noise before feature extraction. Three artificial intelligence (AI) models were developed for identifying frailty status: SpeechAI (using speech data only), DemoAI (using demographic data only), and DemoSpeechAI (combining both data types). Results: Our models were trained and evaluated on the basis of 5-fold cross-validation for 127 participants and compared. The SpeechAI model, using deep learning--based acoustic features, outperformed in terms of accuracy and area under the receiver operating characteristic curve (AUC), 80.4\% (95\% CI 76.89\%?83.91\%) and 0.89 (95\% CI 0.86?0.92), respectively, while the model using only demographics showed an accuracy of 67.96\% (95\% CI 67.63\%?68.29\%) and an AUC of 0.74 (95\% CI 0.73?0.75). The SpeechAI model outperformed the model using only demographics significantly in AUC (t4=8.705 [2-sided]; P<.001). The DemoSpeechAI model, which combined demographics with deep learning--based acoustic features, showed superior performance (accuracy 85.6\%, 95\% CI 80.03\%?91.17\% and AUC 0.93, 95\% CI 0.89?0.97), but there was no significant difference in AUC between the SpeechAI and DemoSpeechAI models (t4=1.057 [2-sided]; P=.35). Compared with models using traditional acoustic features from the openSMILE toolkit, the SpeechAI model demonstrated superior performance (AUC 0.89) over traditional methods (logistic regression: AUC 0.62; decision tree: AUC 0.57; random forest: AUC 0.66). Conclusions: Our findings demonstrate that vocal biomarkers derived from deep learning--based acoustic features can be effectively used to predict frailty status in community-dwelling older adults. The SpeechAI model showed promising accuracy and AUC, outperforming models based solely on demographic data or traditional acoustic features. Furthermore, while the combined DemoSpeechAI model showed slightly improved performance over the SpeechAI model, the difference was not statistically significant. These results suggest that speech-based AI models offer a noninvasive, scalable method for frailty detection, potentially streamlining assessments in clinical and community settings. ", doi="10.2196/57298", url="https://medinform.jmir.org/2025/1/e57298" } @Article{info:doi/10.2196/51602, author="Namatovu, Kasujja Hasifah and Magumba, Abraham Mark and Akena, Dickens", title="E-Screening for Prenatal Depression in Kampala, Uganda Using the Edinburgh Postnatal Depression Scale: Survey Results", journal="Online J Public Health Inform", year="2025", month="Jan", day="14", volume="17", pages="e51602", keywords="perinatal", keywords="prenatal", keywords="antenatal", keywords="antepartum", keywords="depression", keywords="Edinburgh Postnatal Depression Scale", abstract="Background: Perinatal depression remains a substantial public health challenge, often overlooked or incorrectly diagnosed in numerous low-income nations. Objective: The goal of this study was to establish statistical baselines for the prevalence of perinatal depression in Kampala and understand its relationship with key demographic variables. Methods: We employed an Android-based implementation of the Edinburgh Postnatal Depression Scale (EPDS) to survey 12,913 women recruited from 7 government health facilities located in Kampala, Uganda. We used the standard EPDS cutoff, which classifies women with total scores above 13 as possibly depressed and those below 13 as not depressed. The $\chi$2 test of independence was used to determine the most influential categorical variables. We further analyzed the most influential categorical variable using odds ratios. For continuous variables such as age and the weeks of gestation, we performed a simple correlation analysis. Results: We found that 21.5\% (2783/12,913, 95\% CI 20.8\%?22.3\%) were possibly depressed. Respondents' relationship category was found to be the most influential variable ($\chi$21=806.9, P<.001; Cramer's V=0.25), indicating a small effect size. Among quantitative variables, we found a weak negative correlation between respondents' age and the total EPDS score (r=?0.11, P<.001). Similarly, a weak negative correlation was also observed between the total EPDS score and the number of previous children of the respondent (r=?0.07, P<.001). Moreover, a weak positive correlation was noted between weeks of gestation and the total EPDS score (r=0.02, P=.05) Conclusions: This study shows that demographic factors such as spousal employment category, age, and relationship status have an influence on the respondents' EPDS scores. These variables may serve as proxies for latent factors such as financial stability and emotional support. ", doi="10.2196/51602", url="https://ojphi.jmir.org/2025/1/e51602" } @Article{info:doi/10.2196/51615, author="Kuo, Nai-Yu and Tsai, Hsin-Jung and Tsai, Shih-Jen and Yang, C. Albert", title="Efficient Screening in Obstructive Sleep Apnea Using Sequential Machine Learning Models, Questionnaires, and Pulse Oximetry Signals: Mixed Methods Study", journal="J Med Internet Res", year="2024", month="Dec", day="19", volume="26", pages="e51615", keywords="sleep apnea", keywords="machine learning", keywords="questionnaire", keywords="oxygen saturation", keywords="polysomnography", keywords="screening", keywords="sleep disorder", keywords="insomnia", keywords="utilization", keywords="dataset", keywords="training", keywords="diagnostic", abstract="Background: Obstructive sleep apnea (OSA) is a prevalent sleep disorder characterized by frequent pauses or shallow breathing during sleep. Polysomnography, the gold standard for OSA assessment, is time consuming and labor intensive, thus limiting diagnostic efficiency. Objective: This study aims to develop 2 sequential machine learning models to efficiently screen and differentiate OSA. Methods: We used 2 datasets comprising 8444 cases from the Sleep Heart Health Study (SHHS) and 1229 cases from Taipei Veterans General Hospital (TVGH). The Questionnaire Model (Model-Questionnaire) was designed to distinguish OSA from primary insomnia using demographic information and Pittsburgh Sleep Quality Index questionnaires, while the Saturation Model (Model-Saturation) categorized OSA severity based on multiple blood oxygen saturation parameters. The performance of the sequential machine learning models in screening and assessing the severity of OSA was evaluated using an independent test set derived from TVGH. Results: The Model-Questionnaire achieved an F1-score of 0.86, incorporating demographic data and the Pittsburgh Sleep Quality Index. Model-Saturation training by the SHHS dataset displayed an F1-score of 0.82 when using the power spectrum of blood oxygen saturation signals and reached the highest F1-score of 0.85 when considering all saturation-related parameters. Model-saturation training by the TVGH dataset displayed an F1-score of 0.82. The independent test set showed stable results for Model-Questionnaire and Model-Saturation training by the TVGH dataset, but with a slightly decreased F1-score (0.78) in Model-Saturation training by the SHHS dataset. Despite reduced model accuracy across different datasets, precision remained at 0.89 for screening moderate to severe OSA. Conclusions: Although a composite model using multiple saturation parameters exhibits higher accuracy, optimizing this model by identifying key factors is essential. Both models demonstrated adequate at-home screening capabilities for sleep disorders, particularly for patients unsuitable for in-laboratory sleep studies. ", doi="10.2196/51615", url="https://www.jmir.org/2024/1/e51615", url="http://www.ncbi.nlm.nih.gov/pubmed/39699950" } @Article{info:doi/10.2196/55161, author="Wetzel, Anna-Jasmin and Preiser, Christine and M{\"u}ller, Regina and Joos, Stefanie and Koch, Roland and Henking, Tanja and Haumann, Hannah", title="Unveiling Usage Patterns and Explaining Usage of Symptom Checker Apps: Explorative Longitudinal Mixed Methods Study", journal="J Med Internet Res", year="2024", month="Dec", day="9", volume="26", pages="e55161", keywords="self-triage", keywords="eHealth", keywords="self-diagnosis", keywords="mHealth", keywords="mobile health", keywords="usage", keywords="patterns", keywords="predicts", keywords="prediction", keywords="symptoms checker", keywords="apps", keywords="applications", keywords="explorative longitudinal study", keywords="self care", keywords="self management", keywords="self-rated", keywords="mixed method", keywords="circumstances", keywords="General Linear Mixed Models", keywords="GLMM", keywords="qualitative data", keywords="content analysis", keywords="Kuckartz", keywords="survey", keywords="participants", keywords="users", abstract="Background: Symptom checker apps (SCA) aim to enable individuals without medical training to classify perceived symptoms and receive guidance on appropriate actions, such as self-care or seeking professional medical attention. However, there is a lack of detailed understanding regarding the contexts in which individuals use SCA and their opinions on these tools. Objective: This mixed methods study aims to explore the circumstances under which medical laypeople use SCA and to identify which aspects users find noteworthy after using SCA. Methods: A total of 48 SCA users documented their medical symptoms, provided open-ended responses, and recorded their SCA use along with other variables over 6 weeks in a longitudinal study. Generalized linear mixed models with and those without regularization were applied to consider the hierarchical structure of the data, and the models' outcomes were evaluated for comparison. Qualitative data were analyzed through Kuckartz qualitative content analysis. Results: Significant predictors of SCA use included the initial occurrence of symptoms, day of measurement (odds ratio [OR] 0.97), self-rated health (OR 0.80, P<.001), and the following International Classification in Primary Care-2--classified symptoms, that are general and unspecified (OR 3.33, P<.001), eye (OR 5.56, P=.001), cardiovascular (OR 8.33, P<.001), musculoskeletal (OR 5.26, P<.001), and skin (OR 4.76, P<.001). The day of measurement and self-rated health showed minor importance due to their small effect sizes. Qualitative analysis highlighted four main themes: (1) reasons for using SCA, (2) diverse affective responses, (3) a broad spectrum of behavioral reactions, and (4) unmet needs including a lack of personalization. Conclusions: The emergence of new and unfamiliar symptoms was a strong determinant for SCA use. Specific International Classification in Primary Care--rated symptom clusters, particularly those related to cardiovascular, eye, skin, general, and unspecified symptoms, were also highly predictive of SCA use. The varied applications of SCA fit into the concept of health literacy as bricolage, where SCA is leveraged as flexible tools by patients based on individual and situational requirements, functioning alongside other health care resources. ", doi="10.2196/55161", url="https://www.jmir.org/2024/1/e55161" } @Article{info:doi/10.2196/54127, author="Nong, Thu Trang Thi and Nguyen, Hoang Giang and Lepe, Alexander and Tran, Bich Thuy and Nguyen, Phuong Lan Thi and Koot, R. Jaap A.", title="Challenges and Opportunities in Digital Screening for Hypertension and Diabetes Among Community Groups of Older Adults in Vietnam: Mixed Methods Study", journal="J Med Internet Res", year="2024", month="Dec", day="2", volume="26", pages="e54127", keywords="NCD screening", keywords="DHIS2 tracker", keywords="District Health Information Software, version 2 tracker", keywords="digital application", keywords="ISHC health volunteers", keywords="non-communicable diseases", keywords="prevention", keywords="Vietnam", keywords="mobile phone", abstract="Background: The project of scaling up noncommunicable disease (NCD) interventions in Southeast Asia aimed to strengthen the prevention and control of hypertension and diabetes, focusing on primary health care and community levels. In Vietnam, health volunteers who were members of the Intergenerational Self-Help Clubs (ISHCs) implemented community-based NCD screening and health promotion activities in communities. The ISHC health volunteers used an app based on District Health Information Software, version 2 (DHIS2) tracker (Society for Health Information Systems Programmes, India) to record details of participants during screening and other health activities. Objective: This study aimed to assess the strengths, barriers, and limitations of the NCD screening app used by the ISHC health volunteers on tablets and to provide recommendations for further scaling up. Methods: A mixed methods observational study with a convergent parallel design was performed. For the quantitative data analysis, 2 rounds of screening data collected from all 59 ISHCs were analyzed on completeness and quality. For the qualitative analysis, 2 rounds of evaluation of the screening app were completed. Focus group discussions with ISHC health volunteers and club management boards and in-depth interviews with members of the Association of the Elderly and Commune Health Station staff were performed. Results: In the quantitative analysis, data completeness of all 6704 screenings (n=3485 individuals) was very high. For anthropomorphic measurements, such as blood pressure, body weight, and abdominal circumference, less than 1\% errors were found. The data on NCD risk factors were not adequately recorded in 1908 (29.5\%) of the screenings. From the qualitative analysis, the NCD screening app was appreciated by ISHC health volunteers and supervisors, as an easier and more efficient way to report to higher levels, secure data, and strengthen relationships with relevant stakeholders, using tablets to connect to the internet and internet-based platforms to access information for self-learning and sharing to promote a healthy lifestyle as the strengths. The barriers and limitations reported by the respondents were a non--age-friendly app, incomplete translation of parts of the app into Vietnamese, some issues with the tablet's display, lack of sharing of responsibilities among the health volunteers, and suboptimal involvement of the health sector; limited digital literacy among ISHC health volunteers. Recommendations are continuous capacity building, improving app issues, improving tablet issues, and involving relevant stakeholders or younger members in technology adoption to support older people. Conclusions: The implementation of the NCD screening app by ISHC volunteers can be an effective way to improve community-led NCD screening and increase the uptake of NCD prevention and management services at the primary health care level. However, our study has shown that some barriers need to be addressed to maximize the efficient use of the app by ISHC health volunteers to record, report, and manage the screening data. ", doi="10.2196/54127", url="https://www.jmir.org/2024/1/e54127" } @Article{info:doi/10.2196/57641, author="Zhu, Jinpu and Yang, Fushuang and Wang, Yang and Wang, Zhongtian and Xiao, Yao and Wang, Lie and Sun, Liping", title="Accuracy of Machine Learning in Discriminating Kawasaki Disease and Other Febrile Illnesses: Systematic Review and Meta-Analysis", journal="J Med Internet Res", year="2024", month="Nov", day="18", volume="26", pages="e57641", keywords="machine learning", keywords="artificial intelligence", keywords="Kawasaki disease", keywords="febrile illness", keywords="coronary artery lesions", keywords="systematic review", keywords="meta-analysis", abstract="Background: Kawasaki disease (KD) is an acute pediatric vasculitis that can lead to coronary artery aneurysms and severe cardiovascular complications, often presenting with obvious fever in the early stages. In current clinical practice, distinguishing KD from other febrile illnesses remains a significant challenge. In recent years, some researchers have explored the potential of machine learning (ML) methods for the differential diagnosis of KD versus other febrile illnesses, as well as for predicting coronary artery lesions (CALs) in people with KD. However, there is still a lack of systematic evidence to validate their effectiveness. Therefore, we have conducted the first systematic review and meta-analysis to evaluate the accuracy of ML in differentiating KD from other febrile illnesses and in predicting CALs in people with KD, so as to provide evidence-based support for the application of ML in the diagnosis and treatment of KD. Objective: This study aimed to summarize the accuracy of ML in differentiating KD from other febrile illnesses and predicting CALs in people with KD. Methods: PubMed, Cochrane Library, Embase, and Web of Science were systematically searched until September 26, 2023. The risk of bias in the included original studies was appraised using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Stata (version 15.0; StataCorp) was used for the statistical analysis. Results: A total of 29 studies were incorporated. Of them, 20 used ML to differentiate KD from other febrile illnesses. These studies involved a total of 103,882 participants, including 12,541 people with KD. In the validation set, the pooled concordance index, sensitivity, and specificity were 0.898 (95\% CI 0.874-0.922), 0.91 (95\% CI 0.83-0.95), and 0.86 (95\% CI 0.80-0.90), respectively. Meanwhile, 9 studies used ML for early prediction of the risk of CALs in children with KD. These studies involved a total of 6503 people with KD, of whom 986 had CALs. The pooled concordance index in the validation set was 0.787 (95\% CI 0.738-0.835). Conclusions: The diagnostic and predictive factors used in the studies we included were primarily derived from common clinical data. The ML models constructed based on these clinical data demonstrated promising effectiveness in differentiating KD from other febrile illnesses and in predicting coronary artery lesions. Therefore, in future research, we can explore the use of ML methods to identify more efficient predictors and develop tools that can be applied on a broader scale for the differentiation of KD and the prediction of CALs. ", doi="10.2196/57641", url="https://www.jmir.org/2024/1/e57641" } @Article{info:doi/10.2196/52301, author="Nebsbjerg, Amalie Mette and Bomholt, Bj{\o}rnshave Katrine and Vestergaard, H{\o}strup Claus and Christensen, Bondo Morten and Huibers, Linda", title="The Added Value of Using Video in Out-of-Hours Primary Care Telephone Triage Among General Practitioners: Cross-Sectional Survey Study", journal="JMIR Hum Factors", year="2024", month="Nov", day="15", volume="11", pages="e52301", keywords="primary health care", keywords="after-hours care", keywords="referral and consultation", keywords="general practitioners", keywords="triage", keywords="remote consultation", keywords="telemedicine", abstract="Background: Many countries have introduced video consultations in primary care both inside and outside of office hours. Despite some relational and technical limitations, general practitioners (GPs) have reported the benefits of video use in the daytime as it provides faster and more flexible access to health care. Studies have indicated that video may be specifically valuable in out-of-hours primary care (OOH-PC), but additional information on the added value of video use is needed. Objective: This study aimed to investigate triage GPs' perspectives on video use in GP-led telephone triage in OOH-PC by exploring their reasons for choosing video use and its effect on triage outcome, the decision-making process, communication, and invested time. Methods: We conducted a cross-sectional questionnaire study among GPs performing telephone triage in the OOH-PC service in the Central Denmark Region from September 5, 2022, until December 21, 2022. The questionnaire was integrated into the electronic patient registration system as a pop-up window appearing after every third video contact. This setup automatically linked background data on the contact, patient, and GP to the questionnaire data. We used descriptive analyses to describe reasons for and effects of video use and GP evaluation, stratified by patient age. Results: A total of 2456 questionnaires were completed. The most frequent reasons for video use were to assess the severity (n=1951, 79.4\%), to increase the probability of self-care (n=1279, 52.1\%), and to achieve greater certainty in decision-making (n=810, 33\%) (multiple answers were possible for reasons of video use). In 61.9\% (n=1516) of contacts, the triage GPs anticipated that the contact would have resulted in a different triage outcome if video had not been used. Use of video resulted in a downgrading of severity level in 88.3\% (n=1338) of cases. Triage GPs evaluated the use of video as positive in terms of their decision-making process (n=2358, 96\%), communication (n=2214, 90.1\%), and invested time (n=2391, 97.3\%). Conclusions: Triage GPs assessed that the use of video in telephone triage did affect their triage outcome, mostly by downgrading the level of care needed. The participating triage GPs found video in OOH-PC to be of added value, particularly in communication and the decision-making process. ", doi="10.2196/52301", url="https://humanfactors.jmir.org/2024/1/e52301" } @Article{info:doi/10.2196/58504, author="Drummond, David and Gonsard, Apolline", title="Definitions and Characteristics of Patient Digital Twins Being Developed for Clinical Use: Scoping Review", journal="J Med Internet Res", year="2024", month="Nov", day="13", volume="26", pages="e58504", keywords="patient simulation", keywords="cyber-physical systems", keywords="telemonitoring", keywords="personalized medicine", keywords="precision medicine", keywords="digital twin", abstract="Background: The concept of digital twins, widely adopted in industry, is entering health care. However, there is a lack of consensus on what constitutes the digital twin of a patient. Objective: The objective of this scoping review was to analyze definitions and characteristics of patient digital twins being developed for clinical use, as reported in the scientific literature. Methods: We searched PubMed, Scopus, Embase, IEEE, and Google Scholar for studies claiming digital twin development or evaluation until August 2023. Data on definitions, characteristics, and development phase were extracted. Unsupervised classification of claimed digital twins was performed. Results: We identified 86 papers representing 80 unique claimed digital twins, with 98\% (78/80) in preclinical phases. Among the 55 papers defining ``digital twin,'' 76\% (42/55) described a digital replica, 42\% (23/55) mentioned real-time updates, 24\% (13/55) emphasized patient specificity, and 15\% (8/55) included 2-way communication. Among claimed digital twins, 60\% (48/80) represented specific organs (primarily heart: 15/48, 31\%; bones or joints: 10/48, 21\%; lung: 6/48, 12\%; and arteries: 5/48, 10\%); 14\% (11/80) embodied biological systems such as the immune system; and 26\% (21/80) corresponded to other products (prediction models, etc). The patient data used to develop and run the claimed digital twins encompassed medical imaging examinations (35/80, 44\% of publications), clinical notes (15/80, 19\% of publications), laboratory test results (13/80, 16\% of publications), wearable device data (12/80, 15\% of publications), and other modalities (32/80, 40\% of publications). Regarding data flow between patients and their virtual counterparts, 16\% (13/80) claimed that digital twins involved no flow from patient to digital twin, 73\% (58/80) used 1-way flow from patient to digital twin, and 11\% (9/80) enabled 2-way data flow between patient and digital twin. Based on these characteristics, unsupervised classification revealed 3 clusters: simulation patient digital twins in 54\% (43/80) of publications, monitoring patient digital twins in 28\% (22/80) of publications, and research-oriented models unlinked to specific patients in 19\% (15/80) of publications. Simulation patient digital twins used computational modeling for personalized predictions and therapy evaluations, mostly for one-time assessments, and monitoring digital twins harnessed aggregated patient data for continuous risk or outcome forecasting and care optimization. Conclusions: We propose defining a patient digital twin as ``a viewable digital replica of a patient, organ, or biological system that contains multidimensional, patient-specific information and informs decisions'' and to distinguish simulation and monitoring digital twins. These proposed definitions and subtypes offer a framework to guide research into realizing the potential of these personalized, integrative technologies to advance clinical care. ", doi="10.2196/58504", url="https://www.jmir.org/2024/1/e58504" } @Article{info:doi/10.2196/50631, author="Li, Haodong and Qian, Chuang and Yan, Weili and Fu, Dong and Zheng, Yiming and Zhang, Zhiqiang and Meng, Junrong and Wang, Dahui", title="Use of Artificial Intelligence in Cobb Angle Measurement for Scoliosis: Retrospective Reliability and Accuracy Study of a Mobile App", journal="J Med Internet Res", year="2024", month="Nov", day="1", volume="26", pages="e50631", keywords="scoliosis", keywords="photogrammetry", keywords="artificial intelligence", keywords="deep learning", abstract="Background: Scoliosis is a spinal deformity in which one or more spinal segments bend to the side or show vertebral rotation. Some artificial intelligence (AI) apps have already been developed for measuring the Cobb angle in patients with scoliosis. These apps still require doctors to perform certain measurements, which can lead to interobserver variability. The AI app (cobbAngle pro) in this study will eliminate the need for doctor measurements, achieving complete automation. Objective: We aimed to evaluate the reliability and accuracy of our new AI app that is based on deep learning to automatically measure the Cobb angle in patients with scoliosis. Methods: A retrospective analysis was conducted on the clinical data of children with scoliosis who were treated at the Pediatric Orthopedics Department of the Children's Hospital affiliated with Fudan University from July 2019 to July 2022. Three measurers used the Picture Archiving and Communication System (PACS) to measure the coronal main curve Cobb angle in 802 full-length anteroposterior and lateral spine X-rays of 601 children with scoliosis, and recorded the results of each measurement. After an interval of 2 weeks, the mobile AI app was used to remeasure the Cobb angle once. The Cobb angle measurements from the PACS were used as the reference standard, and the accuracy of the Cobb angle measurements by the app was analyzed through the Bland-Altman test. The intraclass correlation coefficient (ICC) was used to compare the repeatability within measurers and the consistency between measurers. Results: Among 601 children with scoliosis, 89 were male and 512 were female (age range: 10-17 years), and 802 full-length spinal X-rays were analyzed. Two functionalities of the app (photography and photo upload) were compared with the PACS for measuring the Cobb angle. The consistency was found to be excellent. The average absolute errors of the Cobb angle measured by the photography and upload methods were 2.00 and 2.08, respectively. Using a clinical allowance maximum error of 5{\textdegree}, the 95\% limits of agreement (LoAs) for Cobb angle measurements by the photography and upload methods were --4.7{\textdegree} to 4.9{\textdegree} and --4.9{\textdegree} to 4.9{\textdegree}, respectively. For the photography and upload methods, the 95\% LoAs for measuring Cobb angles were --4.3{\textdegree} to 4.6{\textdegree} and --4.4{\textdegree} to 4.7{\textdegree}, respectively, in mild scoliosis patients; --4.9{\textdegree} to 5.2{\textdegree} and --5.1{\textdegree} to 5.1{\textdegree}, respectively, in moderate scoliosis patients; and --5.2{\textdegree} to 5.0{\textdegree} and --6.0{\textdegree} to 4.8{\textdegree}, respectively, in severe scoliosis patients. The Cobb angle measured by the 3 observers twice before and after using the photography method had good repeatability (P<.001). The consistency between the observers was excellent (P<.001). Conclusions: The new AI platform is accurate and repeatable in the automatic measurement of the Cobb angle of the main curvature in patients with scoliosis. ", doi="10.2196/50631", url="https://www.jmir.org/2024/1/e50631" } @Article{info:doi/10.2196/58591, author="Zhu, Yan-Yan and Ye, Ze-Hao and Chu, Zhen-Xing and Liu, Yingjie and Wei, Jie and Jia, Le and Jiang, Yong-Jun and Shang, Hong and Hu, Qing-Hai", title="Effects of HIV Self-Testing on Testing Promotion and Risk Behavior Reduction Among Transgender Women in China: Randomized Controlled Trial", journal="J Med Internet Res", year="2024", month="Oct", day="29", volume="26", pages="e58591", keywords="HIV", keywords="HIV self-testing", keywords="testing behavior", keywords="sexual behaviours", keywords="transgender women", keywords="sexual health", keywords="mobile phone", abstract="Background: To date, no randomized controlled trials have specifically addressed behavior changes after HIV self-testing (HIVST) among transgender women. Objective: This study aims to evaluate the effects of HIVST on changes in HIV testing behavior, frequency of condomless sex, and partner numbers among transgender women in China. Methods: Participants were recruited from 2 Chinese cities using both online and offline methods. Transgender women were randomly assigned to receive an HIVST intervention. Data from the previous 3 months were collected at baseline, 3 months, and 6 months. The primary outcome was the mean change in the number of HIV tests among transgender women during the 6-month follow-up. An intention-to-treat analysis was conducted. The statistical analysis used analysis of covariance and linear mixed-effects models. Results: From February to June 2021, and 255 transgender women were recruited, of which only 36.5\% (93/255) had a steady job, and 27.1\% (69/255) earned less than US \$414.9 of income per month. They were randomly assigned to the intervention (n=127) and control (n=128) groups. At 6 months, the mean number of HIV tests was 2.14 (95\% CI 1.80-2.48) in the intervention group and 1.19 (95\% CI 0.99-1.40) in the control group (P<.001), with increases of 0.84 (95\% CI 0.54-1.14) and 0.11 (95\% CI --0.19-0.41) over 6 months, respectively. The net increase was 0.73 (95\% CI 0.31-1.15; P<.001), with a similar adjusted result. No significant differences in the frequency of condomless sex or partner numbers were observed between the 2 groups. Conclusions: HIVST is an effective strategy for enhancing regular HIV testing behavior among transgender women in China. This strategy should be combined with measures to address the financial vulnerability of the transgender women community to reduce subsequent risk behaviors, including condomless sex. Trial Registration: Chinese Clinical Trial Registry ChiCTR2000039766; https://www.chictr.org.cn/showproj.html?proj=61402 ", doi="10.2196/58591", url="https://www.jmir.org/2024/1/e58591" } @Article{info:doi/10.2196/54617, author="Shin, Daun and Kim, Hyoseung and Lee, Seunghwan and Cho, Younhee and Jung, Whanbo", title="Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study", journal="J Med Internet Res", year="2024", month="Sep", day="18", volume="26", pages="e54617", keywords="depression", keywords="screening", keywords="artificial intelligence", keywords="digital health technology", keywords="text data", abstract="Background: Depressive disorders have substantial global implications, leading to various social consequences, including decreased occupational productivity and a high disability burden. Early detection and intervention for clinically significant depression have gained attention; however, the existing depression screening tools, such as the Center for Epidemiologic Studies Depression Scale, have limitations in objectivity and accuracy. Therefore, researchers are identifying objective indicators of depression, including image analysis, blood biomarkers, and ecological momentary assessments (EMAs). Among EMAs, user-generated text data, particularly from diary writing, have emerged as a clinically significant and analyzable source for detecting or diagnosing depression, leveraging advancements in large language models such as ChatGPT. Objective: We aimed to detect depression based on user-generated diary text through an emotional diary writing app using a large language model (LLM). We aimed to validate the value of the semistructured diary text data as an EMA data source. Methods: Participants were assessed for depression using the Patient Health Questionnaire and suicide risk was evaluated using the Beck Scale for Suicide Ideation before starting and after completing the 2-week diary writing period. The text data from the daily diaries were also used in the analysis. The performance of leading LLMs, such as ChatGPT with GPT-3.5 and GPT-4, was assessed with and without GPT-3.5 fine-tuning on the training data set. The model performance comparison involved the use of chain-of-thought and zero-shot prompting to analyze the text structure and content. Results: We used 428 diaries from 91 participants; GPT-3.5 fine-tuning demonstrated superior performance in depression detection, achieving an accuracy of 0.902 and a specificity of 0.955. However, the balanced accuracy was the highest (0.844) for GPT-3.5 without fine-tuning and prompt techniques; it displayed a recall of 0.929. Conclusions: Both GPT-3.5 and GPT-4.0 demonstrated relatively reasonable performance in recognizing the risk of depression based on diaries. Our findings highlight the potential clinical usefulness of user-generated text data for detecting depression. In addition to measurable indicators, such as step count and physical activity, future research should increasingly emphasize qualitative digital expression. ", doi="10.2196/54617", url="https://www.jmir.org/2024/1/e54617", url="http://www.ncbi.nlm.nih.gov/pubmed/39292502" } @Article{info:doi/10.2196/56972, author="Singh, Ben and Chastin, Sebastien and Miatke, Aaron and Curtis, Rachel and Dumuid, Dorothea and Brinsley, Jacinta and Ferguson, Ty and Szeto, Kimberley and Simpson, Catherine and Eglitis, Emily and Willems, Iris and Maher, Carol", title="Real-World Accuracy of Wearable Activity Trackers for Detecting Medical Conditions: Systematic Review and Meta-Analysis", journal="JMIR Mhealth Uhealth", year="2024", month="Aug", day="30", volume="12", pages="e56972", keywords="wearable activity trackers", keywords="disease detection", keywords="atrial fibrillation", keywords="COVID-19 diagnosis", keywords="meta-analysis", keywords="wearables", keywords="wearable tracker", keywords="tracker", keywords="detection", keywords="monitoring", keywords="physiological", keywords="diagnostic tool", keywords="tool", keywords="tools", keywords="Fitbit", keywords="atrial", keywords="COVID-19", keywords="wearable", abstract="Background: Wearable activity trackers, including fitness bands and smartwatches, offer the potential for disease detection by monitoring physiological parameters. However, their accuracy as specific disease diagnostic tools remains uncertain. Objective: This systematic review and meta-analysis aims to evaluate whether wearable activity trackers can be used to detect disease and medical events. Methods: Ten electronic databases were searched for studies published from inception to April 1, 2023. Studies were eligible if they used a wearable activity tracker to diagnose or detect a medical condition or event (eg, falls) in free-living conditions in adults. Meta-analyses were performed to assess the overall area under the curve (\%), accuracy (\%), sensitivity (\%), specificity (\%), and positive predictive value (\%). Subgroup analyses were performed to assess device type (Fitbit, Oura ring, and mixed). The risk of bias was assessed using the Joanna Briggs Institute Critical Appraisal Checklist for Diagnostic Test Accuracy Studies. Results: A total of 28 studies were included, involving a total of 1,226,801 participants (age range 28.6-78.3). In total, 16 (57\%) studies used wearables for diagnosis of COVID-19, 5 (18\%) studies for atrial fibrillation, 3 (11\%) studies for arrhythmia or abnormal pulse, 3 (11\%) studies for falls, and 1 (4\%) study for viral symptoms. The devices used were Fitbit (n=6), Apple watch (n=6), Oura ring (n=3), a combination of devices (n=7), Empatica E4 (n=1), Dynaport MoveMonitor (n=2), Samsung Galaxy Watch (n=1), and other or not specified (n=2). For COVID-19 detection, meta-analyses showed a pooled area under the curve of 80.2\% (95\% CI 71.0\%-89.3\%), an accuracy of 87.5\% (95\% CI 81.6\%-93.5\%), a sensitivity of 79.5\% (95\% CI 67.7\%-91.3\%), and specificity of 76.8\% (95\% CI 69.4\%-84.1\%). For atrial fibrillation detection, pooled positive predictive value was 87.4\% (95\% CI 75.7\%-99.1\%), sensitivity was 94.2\% (95\% CI 88.7\%-99.7\%), and specificity was 95.3\% (95\% CI 91.8\%-98.8\%). For fall detection, pooled sensitivity was 81.9\% (95\% CI 75.1\%-88.1\%) and specificity was 62.5\% (95\% CI 14.4\%-100\%). Conclusions: Wearable activity trackers show promise in disease detection, with notable accuracy in identifying atrial fibrillation and COVID-19. While these findings are encouraging, further research and improvements are required to enhance their diagnostic precision and applicability. Trial Registration: Prospero CRD42023407867; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=407867 ", doi="10.2196/56972", url="https://mhealth.jmir.org/2024/1/e56972" } @Article{info:doi/10.2196/58886, author="Weile, Synne Kathrine and Mathiasen, Ren{\'e} and Winther, Falck Jeanette and Hasle, Henrik and Henriksen, Tram Louise", title="Hjernetegn.dk---The Danish Central Nervous System Tumor Awareness Initiative Digital Decision Support Tool: Design and Implementation Report", journal="JMIR Med Inform", year="2024", month="Jul", day="25", volume="12", pages="e58886", keywords="digital health initiative", keywords="digital health initiatives", keywords="clinical decision support", keywords="decision support", keywords="decision support system", keywords="decision support systems", keywords="decision support tool", keywords="decision support tools", keywords="diagnostic delay", keywords="awareness initiative", keywords="pediatric neurology", keywords="pediatric CNS tumors", keywords="CNS tumor", keywords="CNS tumour", keywords="CNS tumours", keywords="co-creation", keywords="health systems and services", keywords="communication", keywords="central nervous system", abstract="Background: Childhood tumors in the central nervous system (CNS) have longer diagnostic delays than other pediatric tumors. Vague presenting symptoms pose a challenge in the diagnostic process; it has been indicated that patients and parents may be hesitant to seek help, and health care professionals (HCPs) may lack awareness and knowledge about clinical presentation. To raise awareness among HCPs, the Danish CNS tumor awareness initiative hjernetegn.dk was launched. Objective: This study aims to present the learnings from designing and implementing a decision support tool for HCPs to reduce diagnostic delay in childhood CNS tumors. The aims also include decisions regarding strategies for dissemination and use of social media, and an evaluation of the digital impact 6 months after launch. Methods: The phases of developing and implementing the tool include participatory co-creation workshops, designing the website and digital platforms, and implementing a press and media strategy. The digital impact of hjernetegn.dk was evaluated through website analytics and social media engagement. Implementation (Results): hjernetegn.dk was launched in August 2023. The results after 6 months exceeded key performance indicators. The analysis showed a high number of website visitors and engagement, with a plateau reached 3 months after the initial launch. The LinkedIn campaign and Google Search strategy also generated a high number of impressions and clicks. Conclusions: The findings suggest that the initiative has been successfully integrated, raising awareness and providing a valuable tool for HCPs in diagnosing childhood CNS tumors. The study highlights the importance of interdisciplinary collaboration, co-creation, and ongoing community management, as well as broad dissemination strategies when introducing a digital support tool. ", doi="10.2196/58886", url="https://medinform.jmir.org/2024/1/e58886" } @Article{info:doi/10.2196/55542, author="Knitza, Johannes and Tascilar, Koray and Fuchs, Franziska and Mohn, Jacob and Kuhn, Sebastian and Bohr, Daniela and Muehlensiepen, Felix and Bergmann, Christina and Labinsky, Hannah and Morf, Harriet and Araujo, Elizabeth and Englbrecht, Matthias and Vorbr{\"u}ggen, Wolfgang and von der Decken, Cay-Benedict and Kleinert, Stefan and Ramming, Andreas and Distler, W. J{\"o}rg H. and Bartz-Bazzanella, Peter and Vuillerme, Nicolas and Schett, Georg and Welcker, Martin and Hueber, Axel", title="Diagnostic Accuracy of a Mobile AI-Based Symptom Checker and a Web-Based Self-Referral Tool in Rheumatology: Multicenter Randomized Controlled Trial", journal="J Med Internet Res", year="2024", month="Jul", day="23", volume="26", pages="e55542", keywords="symptom checker", keywords="artificial intelligence", keywords="eHealth", keywords="diagnostic decision support system", keywords="rheumatology", keywords="decision support", keywords="decision", keywords="diagnostic", keywords="tool", keywords="rheumatologists", keywords="symptom assessment", keywords="resources", keywords="randomized controlled trial", keywords="diagnosis", keywords="decision support system", keywords="support system", keywords="support", abstract="Background: The diagnosis of inflammatory rheumatic diseases (IRDs) is often delayed due to unspecific symptoms and a shortage of rheumatologists. Digital diagnostic decision support systems (DDSSs) have the potential to expedite diagnosis and help patients navigate the health care system more efficiently. Objective: The aim of this study was to assess the diagnostic accuracy of a mobile artificial intelligence (AI)--based symptom checker (Ada) and a web-based self-referral tool (Rheport) regarding IRDs. Methods: A prospective, multicenter, open-label, crossover randomized controlled trial was conducted with patients newly presenting to 3 rheumatology centers. Participants were randomly assigned to complete a symptom assessment using either Ada or Rheport. The primary outcome was the correct identification of IRDs by the DDSSs, defined as the presence of any IRD in the list of suggested diagnoses by Ada or achieving a prespecified threshold score with Rheport. The gold standard was the diagnosis made by rheumatologists. Results: A total of 600 patients were included, among whom 214 (35.7\%) were diagnosed with an IRD. Most frequent IRD was rheumatoid arthritis with 69 (11.5\%) patients. Rheport's disease suggestion and Ada's top 1 (D1) and top 5 (D5) disease suggestions demonstrated overall diagnostic accuracies of 52\%, 63\%, and 58\%, respectively, for IRDs. Rheport showed a sensitivity of 62\% and a specificity of 47\% for IRDs. Ada's D1 and D5 disease suggestions showed a sensitivity of 52\% and 66\%, respectively, and a specificity of 68\% and 54\%, respectively, concerning IRDs. Ada's diagnostic accuracy regarding individual diagnoses was heterogenous, and Ada performed considerably better in identifying rheumatoid arthritis in comparison to other diagnoses (D1: 42\%; D5: 64\%). The Cohen $\kappa$ statistic of Rheport for agreement on any rheumatic disease diagnosis with Ada D1 was 0.15 (95\% CI 0.08-0.18) and with Ada D5 was 0.08 (95\% CI 0.00-0.16), indicating poor agreement for the presence of any rheumatic disease between the 2 DDSSs. Conclusions: To our knowledge, this is the largest comparative DDSS trial with actual use of DDSSs by patients. The diagnostic accuracies of both DDSSs for IRDs were not promising in this high-prevalence patient population. DDSSs may lead to a misuse of scarce health care resources. Our results underscore the need for stringent regulation and drastic improvements to ensure the safety and efficacy of DDSSs. Trial Registration: German Register of Clinical Trials DRKS00017642; https://drks.de/search/en/trial/DRKS00017642 ", doi="10.2196/55542", url="https://www.jmir.org/2024/1/e55542" } @Article{info:doi/10.2196/56226, author="Chen, Xiaolan and Zhang, Han and Li, Zhiwen and Liu, Shuang and Zhou, Yuqi", title="Continuous Monitoring of Heart Rate Variability and Respiration for the Remote Diagnosis of Chronic Obstructive Pulmonary Disease: Prospective Observational Study", journal="JMIR Mhealth Uhealth", year="2024", month="Jul", day="18", volume="12", pages="e56226", keywords="continuous monitoring", keywords="chronic obstructive pulmonary disease", keywords="COPD diagnosis", keywords="prospective study", keywords="ROC curve", keywords="heart rate variability", keywords="respiratory rate", keywords="heart rate", keywords="noncontact bed sensors", abstract="Background: Conventional daytime monitoring in a single day may be influenced by factors such as motion artifacts and emotions, and continuous monitoring of nighttime heart rate variability (HRV) and respiration to assist in chronic obstructive pulmonary disease (COPD) diagnosis has not been reported yet. Objective: The aim of this study was to explore and compare the effects of continuously monitored HRV, heart rate (HR), and respiration during night sleep on the remote diagnosis of COPD. Methods: We recruited patients with different severities of COPD and healthy controls between January 2021 and November 2022. Vital signs such as HRV, HR, and respiration were recorded using noncontact bed sensors from 10 PM to 8 AM of the following day, and the recordings of each patient lasted for at least 30 days. We obtained statistical means of HRV, HR, and respiration over time periods of 7, 14, and 30 days by continuous monitoring. Additionally, the effects that the statistical means of HRV, HR, and respiration had on COPD diagnosis were evaluated at different times of recordings. Results: In this study, 146 individuals were enrolled: 37 patients with COPD in the case group and 109 participants in the control group. The median number of continuous night-sleep monitoring days per person was 56.5 (IQR 32.0-113.0) days. Using the features regarding the statistical means of HRV, HR, and respiration over 1, 7, 14, and 30 days, binary logistic regression classification of COPD yielded an accuracy, Youden index, and area under the receiver operating characteristic curve of 0.958, 0.904, and 0.989, respectively. The classification performance for COPD diagnosis was directionally proportional to the monitoring duration of vital signs at night. The importance of the features for diagnosis was determined by the statistical means of respiration, HRV, and HR, which followed the order of respiration > HRV > HR. Specifically, the statistical means of the duration of respiration rate faster than 21 times/min (RRF), high frequency band power of 0.15-0.40 Hz (HF), and respiration rate (RR) were identified as the top 3 most significant features for classification, corresponding to cutoff values of 0.1 minute, 1316.3 nU, and 16.3 times/min, respectively. Conclusions: Continuous monitoring of nocturnal vital signs has significant potential for the remote diagnosis of COPD. As the duration of night-sleep monitoring increased from 1 to 30 days, the statistical means of HRV, HR, and respiration showed a better reflection of an individual's health condition compared to monitoring the vital signs in a single day or night, and better was the classification performance for COPD diagnosis. Further, the statistical means of RRF, HF, and RR are crucial features for diagnosing COPD, demonstrating the importance of monitoring HRV and respiration during night sleep. ", doi="10.2196/56226", url="https://mhealth.jmir.org/2024/1/e56226" } @Article{info:doi/10.2196/56110, author="Hoppe, Michael John and Auer, K. Matthias and Str{\"u}ven, Anna and Massberg, Steffen and Stremmel, Christopher", title="ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis", journal="J Med Internet Res", year="2024", month="Jul", day="8", volume="26", pages="e56110", keywords="emergency department", keywords="diagnosis", keywords="accuracy", keywords="artificial intelligence", keywords="ChatGPT", keywords="internal medicine", keywords="AI", keywords="natural language processing", keywords="NLP", keywords="emergency medicine triage", keywords="triage", keywords="physicians", keywords="physician", keywords="diagnostic accuracy", keywords="OpenAI", abstract="Background: OpenAI's ChatGPT is a pioneering artificial intelligence (AI) in the field of natural language processing, and it holds significant potential in medicine for providing treatment advice. Additionally, recent studies have demonstrated promising results using ChatGPT for emergency medicine triage. However, its diagnostic accuracy in the emergency department (ED) has not yet been evaluated. Objective: This study compares the diagnostic accuracy of ChatGPT with GPT-3.5 and GPT-4 and primary treating resident physicians in an ED setting. Methods: Among 100 adults admitted to our ED in January 2023 with internal medicine issues, the diagnostic accuracy was assessed by comparing the diagnoses made by ED resident physicians and those made by ChatGPT with GPT-3.5 or GPT-4 against the final hospital discharge diagnosis, using a point system for grading accuracy. Results: The study enrolled 100 patients with a median age of 72 (IQR 58.5-82.0) years who were admitted to our internal medicine ED primarily for cardiovascular, endocrine, gastrointestinal, or infectious diseases. GPT-4 outperformed both GPT-3.5 (P<.001) and ED resident physicians (P=.01) in diagnostic accuracy for internal medicine emergencies. Furthermore, across various disease subgroups, GPT-4 consistently outperformed GPT-3.5 and resident physicians. It demonstrated significant superiority in cardiovascular (GPT-4 vs ED physicians: P=.03) and endocrine or gastrointestinal diseases (GPT-4 vs GPT-3.5: P=.01). However, in other categories, the differences were not statistically significant. Conclusions: In this study, which compared the diagnostic accuracy of GPT-3.5, GPT-4, and ED resident physicians against a discharge diagnosis gold standard, GPT-4 outperformed both the resident physicians and its predecessor, GPT-3.5. Despite the retrospective design of the study and its limited sample size, the results underscore the potential of AI as a supportive diagnostic tool in ED settings. ", doi="10.2196/56110", url="https://www.jmir.org/2024/1/e56110" } @Article{info:doi/10.2196/48811, author="Marri, Shankar Shiva and Albadri, Warood and Hyder, Salman Mohammed and Janagond, B. Ajit and Inamadar, C. Arun", title="Efficacy of an Artificial Intelligence App (Aysa) in Dermatological Diagnosis: Cross-Sectional Analysis", journal="JMIR Dermatol", year="2024", month="Jul", day="2", volume="7", pages="e48811", keywords="artificial intelligence", keywords="AI", keywords="AI-aided diagnosis", keywords="dermatology", keywords="mobile app", keywords="application", keywords="neural network", keywords="machine learning", keywords="dermatological", keywords="skin", keywords="computer-aided diagnosis", keywords="diagnostic", keywords="imaging", keywords="lesion", abstract="Background: Dermatology is an ideal specialty for artificial intelligence (AI)--driven image recognition to improve diagnostic accuracy and patient care. Lack of dermatologists in many parts of the world and the high frequency of cutaneous disorders and malignancies highlight the increasing need for AI-aided diagnosis. Although AI-based applications for the identification of dermatological conditions are widely available, research assessing their reliability and accuracy is lacking. Objective: The aim of this study was to analyze the efficacy of the Aysa AI app as a preliminary diagnostic tool for various dermatological conditions in a semiurban town in India. Methods: This observational cross-sectional study included patients over the age of 2 years who visited the dermatology clinic. Images of lesions from individuals with various skin disorders were uploaded to the app after obtaining informed consent. The app was used to make a patient profile, identify lesion morphology, plot the location on a human model, and answer questions regarding duration and symptoms. The app presented eight differential diagnoses, which were compared with the clinical diagnosis. The model's performance was evaluated using sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and F1-score. Comparison of categorical variables was performed with the $\chi$2 test and statistical significance was considered at P<.05. Results: A total of 700 patients were part of the study. A wide variety of skin conditions were grouped into 12 categories. The AI model had a mean top-1 sensitivity of 71\% (95\% CI 61.5\%-74.3\%), top-3 sensitivity of 86.1\% (95\% CI 83.4\%-88.6\%), and all-8 sensitivity of 95.1\% (95\% CI 93.3\%-96.6\%). The top-1 sensitivities for diagnosis of skin infestations, disorders of keratinization, other inflammatory conditions, and bacterial infections were 85.7\%, 85.7\%, 82.7\%, and 81.8\%, respectively. In the case of photodermatoses and malignant tumors, the top-1 sensitivities were 33.3\% and 10\%, respectively. Each category had a strong correlation between the clinical diagnosis and the probable diagnoses (P<.001). Conclusions: The Aysa app showed promising results in identifying most dermatoses. ", doi="10.2196/48811", url="https://derma.jmir.org/2024/1/e48811" } @Article{info:doi/10.2196/58491, author="Lu, Linken and Lu, Tangsheng and Tian, Chunyu and Zhang, Xiujun", title="AI: Bridging Ancient Wisdom and Modern Innovation in Traditional Chinese Medicine", journal="JMIR Med Inform", year="2024", month="Jun", day="28", volume="12", pages="e58491", keywords="traditional Chinese medicine", keywords="TCM", keywords="artificial intelligence", keywords="AI", keywords="diagnosis", doi="10.2196/58491", url="https://medinform.jmir.org/2024/1/e58491", url="http://www.ncbi.nlm.nih.gov/pubmed/38941141" } @Article{info:doi/10.2196/58551, author="Hall, Evelyn and Keyser, Laura and McKinney, Jessica and Pulliam, Samantha and Weinstein, Milena", title="Real-World Evidence From a Digital Health Treatment Program for Female Urinary Incontinence: Observational Study of Outcomes Following User-Centered Product Design", journal="JMIR Form Res", year="2024", month="Jun", day="27", volume="8", pages="e58551", keywords="urinary incontinence", keywords="digital health", keywords="pelvic floor muscle training", keywords="real-world", keywords="evidence", keywords="user-centered design", keywords="mobile phone", abstract="Background: Urinary incontinence (UI) affects millions of women with substantial health and quality-of-life impacts. Supervised pelvic floor muscle training (PFMT) is the recommended first-line treatment. However, multiple individual and institutional barriers impede women's access to skilled care. Evidence suggests that digital health solutions are acceptable and may be effective in delivering first-line incontinence treatment, although these technologies have not yet been leveraged at scale. Objective: The primary objective is to describe the effectiveness and safety of a prescribed digital health treatment program to guide PFMT for UI treatment among real-world users. The secondary objectives are to evaluate patient engagement following an updated user platform and identify the factors predictive of success. Methods: This retrospective cohort study of women who initiated device use between January 1, 2022, and June 30, 2023, included users aged ?18 years old with a diagnosis of stress, urgency, or mixed incontinence or a score of >33.3 points on the Urogenital Distress Inventory Short Form (UDI-6). Users are prescribed a 2.5-minute, twice-daily, training program guided by an intravaginal, motion-based device that pairs with a smartphone app. Data collected by the device or app include patient-reported demographics and outcomes, adherence to the twice-daily regimen, and pelvic floor muscle performance parameters, including angle change and hold time. Symptom improvement was assessed by the UDI-6 score change from baseline to the most recent score using paired 2-tailed t tests. Factors associated with meeting the UDI-6 minimum clinically important difference were evaluated by regression analysis. Results: Of 1419 users, 947 met inclusion criteria and provided data for analysis. The mean baseline UDI-6 score was 46.8 (SD 19.3), and the mean UDI-6 score change was 11.3 (SD 19.9; P<.001). Improvement was reported by 74\% (697/947) and was similar across age, BMI, and incontinence subtype. Mean adherence was 89\% (mean 12.5, SD 2.1 of 14 possible weekly uses) over 12 weeks. Those who used the device ?10 times per week were more likely to achieve symptom improvement. In multivariate logistic regression analysis, baseline incontinence symptom severity and maximum angle change during pelvic floor muscle contraction were significantly associated with meeting the UDI-6 minimum clinically important difference. Age, BMI, and UI subtype were not associated. Conclusions: This study provides real-world evidence to support the effectiveness and safety of a prescribed digital health treatment program for female UI. A digital PFMT program completed with visual guidance from a motion-based device yields significant results when executed ?10 times per week over a period of 12 weeks. The program demonstrates high user engagement, with 92.9\% (880/947) of users adhering to the prescribed training regimen. First-line incontinence treatment, when implemented using this digital program, leads to statistically and clinically substantial symptom improvements across age and BMI categories and incontinence subtypes. ", doi="10.2196/58551", url="https://formative.jmir.org/2024/1/e58551" } @Article{info:doi/10.2196/58157, author="Meer, Andreas and Rahm, Philipp and Schwendinger, Markus and Vock, Michael and Grunder, Bettina and Demurtas, Jacopo and Rutishauser, Jonas", title="A Symptom-Checker for Adult Patients Visiting an Interdisciplinary Emergency Care Center and the Safety of Patient Self-Triage: Real-Life Prospective Evaluation", journal="J Med Internet Res", year="2024", month="Jun", day="27", volume="26", pages="e58157", keywords="safety", keywords="telemedicine", keywords="teletriage", keywords="symptom-checker", keywords="self-triage", keywords="self-assessment", keywords="triage", keywords="triaging", keywords="symptom", keywords="symptoms", keywords="validation", keywords="validity", keywords="telehealth", keywords="mHealth", keywords="mobile health", keywords="app", keywords="apps", keywords="application", keywords="applications", keywords="diagnosis", keywords="diagnoses", keywords="diagnostic", keywords="diagnostics", keywords="checker", keywords="checkers", keywords="check", keywords="web", keywords="neural network", keywords="neural networks", abstract="Background: Symptom-checkers have become important tools for self-triage, assisting patients to determine the urgency of medical care. To be safe and effective, these tools must be validated, particularly to avoid potentially hazardous undertriage without leading to inefficient overtriage. Only limited safety data from studies including small sample sizes have been available so far. Objective: The objective of our study was to prospectively investigate the safety of patients' self-triage in a large patient sample. We used SMASS (Swiss Medical Assessment System; in4medicine, Inc) pathfinder, a symptom-checker based on a computerized transparent neural network. Methods: We recruited 2543 patients into this single-center, prospective clinical trial conducted at the cantonal hospital of Baden, Switzerland. Patients with an Emergency Severity Index of 1-2 were treated by the team of the emergency department, while those with an index of 3-5 were seen at the walk-in clinic by general physicians. We compared the triage recommendation obtained by the patients' self-triage with the assessment of clinical urgency made by 3 successive interdisciplinary panels of physicians (panels A, B, and C). Using the Clopper-Pearson CI, we assumed that to confirm the symptom-checkers' safety, the upper confidence bound for the probability of a potentially hazardous undertriage should lie below 1\%. A potentially hazardous undertriage was defined as a triage in which either all (consensus criterion) or the majority (majority criterion) of the experts of the last panel (panel C) rated the triage of the symptom-checker to be ``rather likely'' or ``likely'' life-threatening or harmful. Results: Of the 2543 patients, 1227 (48.25\%) were female and 1316 (51.75\%) male. None of the patients reached the prespecified consensus criterion for a potentially hazardous undertriage. This resulted in an upper 95\% confidence bound of 0.1184\%. Further, 4 cases met the majority criterion. This resulted in an upper 95\% confidence bound for the probability of a potentially hazardous undertriage of 0.3616\%. The 2-sided 95\% Clopper-Pearson CI for the probability of overtriage (n=450 cases,17.69\%) was 16.23\% to 19.24\%, which is considerably lower than the figures reported in the literature. Conclusions: The symptom-checker proved to be a safe triage tool, avoiding potentially hazardous undertriage in a real-life clinical setting of emergency consultations at a walk-in clinic or emergency department without causing undesirable overtriage. Our data suggest the symptom-checker may be safely used in clinical routine. Trial Registration: ClinicalTrials.gov NCT04055298; https://clinicaltrials.gov/study/NCT04055298 ", doi="10.2196/58157", url="https://www.jmir.org/2024/1/e58157", url="http://www.ncbi.nlm.nih.gov/pubmed/38809606" } @Article{info:doi/10.2196/59267, author="Hirosawa, Takanobu and Harada, Yukinori and Mizuta, Kazuya and Sakamoto, Tetsu and Tokumasu, Kazuki and Shimizu, Taro", title="Evaluating ChatGPT-4's Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases", journal="JMIR Form Res", year="2024", month="Jun", day="26", volume="8", pages="e59267", keywords="decision support system", keywords="diagnostic errors", keywords="diagnostic excellence", keywords="diagnosis", keywords="large language model", keywords="LLM", keywords="natural language processing", keywords="GPT-4", keywords="ChatGPT", keywords="diagnoses", keywords="physicians", keywords="artificial intelligence", keywords="AI", keywords="chatbots", keywords="medical diagnosis", keywords="assessment", keywords="decision-making support", keywords="application", keywords="applications", keywords="app", keywords="apps", abstract="Background: The potential of artificial intelligence (AI) chatbots, particularly ChatGPT with GPT-4 (OpenAI), in assisting with medical diagnosis is an emerging research area. However, it is not yet clear how well AI chatbots can evaluate whether the final diagnosis is included in differential diagnosis lists. Objective: This study aims to assess the capability of GPT-4 in identifying the final diagnosis from differential-diagnosis lists and to compare its performance with that of physicians for case report series. Methods: We used a database of differential-diagnosis lists from case reports in the American Journal of Case Reports, corresponding to final diagnoses. These lists were generated by 3 AI systems: GPT-4, Google Bard (currently Google Gemini), and Large Language Models by Meta AI 2 (LLaMA2). The primary outcome was focused on whether GPT-4's evaluations identified the final diagnosis within these lists. None of these AIs received additional medical training or reinforcement. For comparison, 2 independent physicians also evaluated the lists, with any inconsistencies resolved by another physician. Results: The 3 AIs generated a total of 1176 differential diagnosis lists from 392 case descriptions. GPT-4's evaluations concurred with those of the physicians in 966 out of 1176 lists (82.1\%). The Cohen $\kappa$ coefficient was 0.63 (95\% CI 0.56-0.69), indicating a fair to good agreement between GPT-4 and the physicians' evaluations. Conclusions: GPT-4 demonstrated a fair to good agreement in identifying the final diagnosis from differential-diagnosis lists, comparable to physicians for case report series. Its ability to compare differential diagnosis lists with final diagnoses suggests its potential to aid clinical decision-making support through diagnostic feedback. While GPT-4 showed a fair to good agreement for evaluation, its application in real-world scenarios and further validation in diverse clinical environments are essential to fully understand its utility in the diagnostic process. ", doi="10.2196/59267", url="https://formative.jmir.org/2024/1/e59267" } @Article{info:doi/10.2196/48777, author="Li, Aoyu and Li, Jingwen and Chai, Jiali and Wu, Wei and Chaudhary, Suamn and Zhao, Juanjuan and Qiang, Yan", title="Detection of Mild Cognitive Impairment Through Hand Motor Function Under Digital Cognitive Test: Mixed Methods Study", journal="JMIR Mhealth Uhealth", year="2024", month="Jun", day="26", volume="12", pages="e48777", keywords="mild cognitive impairment", keywords="movement kinetics", keywords="digital cognitive test", keywords="dual task", keywords="mobile phone", abstract="Background: Early detection of cognitive impairment or dementia is essential to reduce the incidence of severe neurodegenerative diseases. However, currently available diagnostic tools for detecting mild cognitive impairment (MCI) or dementia are time-consuming, expensive, or not widely accessible. Hence, exploring more effective methods to assist clinicians in detecting MCI is necessary. Objective: In this study, we aimed to explore the feasibility and efficiency of assessing MCI through movement kinetics under tablet-based ``drawing and dragging'' tasks. Methods: We iteratively designed ``drawing and dragging'' tasks by conducting symposiums, programming, and interviews with stakeholders (neurologists, nurses, engineers, patients with MCI, healthy older adults, and caregivers). Subsequently, stroke patterns and movement kinetics were evaluated in healthy control and MCI groups by comparing 5 categories of features related to hand motor function (ie, time, stroke, frequency, score, and sequence). Finally, user experience with the overall cognitive screening system was investigated using structured questionnaires and unstructured interviews, and their suggestions were recorded. Results: The ``drawing and dragging'' tasks can detect MCI effectively, with an average accuracy of 85\% (SD 2\%). Using statistical comparison of movement kinetics, we discovered that the time- and score-based features are the most effective among all the features. Specifically, compared with the healthy control group, the MCI group showed a significant increase in the time they took for the hand to switch from one stroke to the next, with longer drawing times, slow dragging, and lower scores. In addition, patients with MCI had poorer decision-making strategies and visual perception of drawing sequence features, as evidenced by adding auxiliary information and losing more local details in the drawing. Feedback from user experience indicates that our system is user-friendly and facilitates screening for deficits in self-perception. Conclusions: The tablet-based MCI detection system quantitatively assesses hand motor function in older adults and further elucidates the cognitive and behavioral decline phenomenon in patients with MCI. This innovative approach serves to identify and measure digital biomarkers associated with MCI or Alzheimer dementia, enabling the monitoring of changes in patients' executive function and visual perceptual abilities as the disease advances. ", doi="10.2196/48777", url="https://mhealth.jmir.org/2024/1/e48777" } @Article{info:doi/10.2196/58398, author="Maxin, J. Anthony and Lim, H. Do and Kush, Sophie and Carpenter, Jack and Shaibani, Rami and Gulek, G. Bernice and Harmon, G. Kimberly and Mariakakis, Alex and McGrath, B. Lynn and Levitt, R. Michael", title="Smartphone Pupillometry and Machine Learning for Detection of Acute Mild Traumatic Brain Injury: Cohort Study", journal="JMIR Neurotech", year="2024", month="Jun", day="13", volume="3", pages="e58398", keywords="smartphone pupillometry", keywords="pupillary light reflex", keywords="biomarkers", keywords="digital health", keywords="mild traumatic brain injury", keywords="concussion", keywords="machine learning", keywords="artificial intelligence", keywords="AI", keywords="pupillary", keywords="pilot study", keywords="brain", keywords="brain injury", keywords="injury", keywords="diagnostic", keywords="pupillometer", keywords="neuroimaging", keywords="diagnosis", keywords="artificial", keywords="mobile phone", abstract="Background: Quantitative pupillometry is used in mild traumatic brain injury (mTBI) with changes in pupil reactivity noted after blast injury, chronic mTBI, and sports-related concussion. Objective: We evaluated the diagnostic capabilities of a smartphone-based digital pupillometer to differentiate patients with mTBI in the emergency department from controls. Methods: Adult patients diagnosed with acute mTBI with normal neuroimaging were evaluated in an emergency department within 36 hours of injury (control group: healthy adults). The PupilScreen smartphone pupillometer was used to measure the pupillary light reflex (PLR), and quantitative curve morphological parameters of the PLR were compared between mTBI and healthy controls. To address the class imbalance in our sample, a synthetic minority oversampling technique was applied. All possible combinations of PLR parameters produced by the smartphone pupillometer were then applied as features to 4 binary classification machine learning algorithms: random forest, k-nearest neighbors, support vector machine, and logistic regression. A 10-fold cross-validation technique stratified by cohort was used to produce accuracy, sensitivity, specificity, area under the curve, and F1-score metrics for the classification of mTBI versus healthy participants. Results: Of 12 patients with acute mTBI, 33\% (4/12) were female (mean age 54.1, SD 22.2 years), and 58\% (7/12) were White with a median Glasgow Coma Scale (GCS) of 15. Of the 132 healthy patients, 67\% (88/132) were female, with a mean age of 36 (SD 10.2) years and 64\% (84/132) were White with a median GCS of 15. Significant differences were observed in PLR recordings between healthy controls and patients with acute mTBI in the PLR parameters, that are (1) percent change (mean 34\%, SD 8.3\% vs mean 26\%, SD 7.9\%; P<.001), (2) minimum pupillary diameter (mean 34.8, SD 6.1 pixels vs mean 29.7, SD 6.1 pixels; P=.004), (3) maximum pupillary diameter (mean 53.6, SD 12.4 pixels vs mean 40.9, SD 11.9 pixels; P<.001), and (4) mean constriction velocity (mean 11.5, SD 5.0 pixels/second vs mean 6.8, SD 3.0 pixels/second; P<.001) between cohorts. After the synthetic minority oversampling technique, both cohorts had a sample size of 132 recordings. The best-performing binary classification model was a random forest model using the PLR parameters of latency, percent change, maximum diameter, minimum diameter, mean constriction velocity, and maximum constriction velocity as features. This model produced an overall accuracy of 93.5\%, sensitivity of 96.2\%, specificity of 90.9\%, area under the curve of 0.936, and F1-score of 93.7\% for differentiating between pupillary changes in mTBI and healthy participants. The absolute values are unable to be provided for the performance percentages reported here due to the mechanism of 10-fold cross validation that was used to obtain them. Conclusions: In this pilot study, quantitative smartphone pupillometry demonstrates the potential to be a useful tool in the future diagnosis of acute mTBI. ", doi="10.2196/58398", url="https://neuro.jmir.org/2024/1/e58398" } @Article{info:doi/10.2196/50344, author="Zawati, H. Ma'n and Lang, Michael", title="Does an App a Day Keep the Doctor Away? AI Symptom Checker Applications, Entrenched Bias, and Professional Responsibility", journal="J Med Internet Res", year="2024", month="Jun", day="5", volume="26", pages="e50344", keywords="artificial intelligence", keywords="applications", keywords="mobile health", keywords="mHealth", keywords="bias", keywords="biases", keywords="professional obligations", keywords="professional obligation", keywords="app", keywords="apps", keywords="application", keywords="symptom checker", keywords="symptom checkers", keywords="diagnose", keywords="diagnosis", keywords="self-diagnose", keywords="self-diagnosis", keywords="ethic", keywords="ethics", keywords="ethical", keywords="regulation", keywords="regulations", keywords="legal", keywords="law", keywords="laws", keywords="safety", keywords="mobile phone", doi="10.2196/50344", url="https://www.jmir.org/2024/1/e50344", url="http://www.ncbi.nlm.nih.gov/pubmed/38838309" } @Article{info:doi/10.2196/51822, author="Lefkovitz, Ilana and Walsh, Samantha and Blank, J. Leah and Jett{\'e}, Nathalie and Kummer, R. Benjamin", title="Direct Clinical Applications of Natural Language Processing in Common Neurological Disorders: Scoping Review", journal="JMIR Neurotech", year="2024", month="May", day="22", volume="3", pages="e51822", keywords="natural language processing", keywords="NLP", keywords="unstructured", keywords="text", keywords="machine learning", keywords="deep learning", keywords="neurology", keywords="headache disorders", keywords="migraine", keywords="Parkinson disease", keywords="cerebrovascular disease", keywords="stroke", keywords="transient ischemic attack", keywords="epilepsy", keywords="multiple sclerosis", keywords="cardiovascular", keywords="artificial intelligence", keywords="Parkinson", keywords="neurological", keywords="neurological disorder", keywords="scoping review", keywords="diagnosis", keywords="treatment", keywords="prediction", abstract="Background: Natural language processing (NLP), a branch of artificial intelligence that analyzes unstructured language, is being increasingly used in health care. However, the extent to which NLP has been formally studied in neurological disorders remains unclear. Objective: We sought to characterize studies that applied NLP to the diagnosis, prediction, or treatment of common neurological disorders. Methods: This review followed the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) standards. The search was conducted using MEDLINE and Embase on May 11, 2022. Studies of NLP use in migraine, Parkinson disease, Alzheimer disease, stroke and transient ischemic attack, epilepsy, or multiple sclerosis were included. We excluded conference abstracts, review papers, as well as studies involving heterogeneous clinical populations or indirect clinical uses of NLP. Study characteristics were extracted and analyzed using descriptive statistics. We did not aggregate measurements of performance in our review due to the high variability in study outcomes, which is the main limitation of the study. Results: In total, 916 studies were identified, of which 41 (4.5\%) met all eligibility criteria and were included in the final review. Of the 41 included studies, the most frequently represented disorders were stroke and transient ischemic attack (n=20, 49\%), followed by epilepsy (n=10, 24\%), Alzheimer disease (n=6, 15\%), and multiple sclerosis (n=5, 12\%). We found no studies of NLP use in migraine or Parkinson disease that met our eligibility criteria. The main objective of NLP was diagnosis (n=20, 49\%), followed by disease phenotyping (n=17, 41\%), prognostication (n=9, 22\%), and treatment (n=4, 10\%). In total, 18 (44\%) studies used only machine learning approaches, 6 (15\%) used only rule-based methods, and 17 (41\%) used both. Conclusions: We found that NLP was most commonly applied for diagnosis, implying a potential role for NLP in augmenting diagnostic accuracy in settings with limited access to neurological expertise. We also found several gaps in neurological NLP research, with few to no studies addressing certain disorders, which may suggest additional areas of inquiry. Trial Registration: Prospective Register of Systematic Reviews (PROSPERO) CRD42021228703; https://www.crd.york.ac.uk/PROSPERO/display\_record.php?RecordID=228703 ", doi="10.2196/51822", url="https://neuro.jmir.org/2024/1/e51822" } @Article{info:doi/10.2196/52577, author="Scott, E. Suzanne and Thompson, J. Matthew", title="``Notification! You May Have Cancer.'' Could Smartphones and Wearables Help Detect Cancer Early?", journal="JMIR Cancer", year="2024", month="May", day="20", volume="10", pages="e52577", keywords="wearables", keywords="early diagnosis", keywords="cancer", keywords="challenges", keywords="diagnosis", keywords="wearable", keywords="detect", keywords="detection", keywords="smartphone", keywords="cancer diagnosis", keywords="symptoms", keywords="monitoring", keywords="monitor", keywords="implementation", keywords="anxiety", keywords="health care service", keywords="mobile phone", doi="10.2196/52577", url="https://cancer.jmir.org/2024/1/e52577", url="http://www.ncbi.nlm.nih.gov/pubmed/38767941" } @Article{info:doi/10.2196/45115, author="Schnoor, Kyma and Talboom-Kamp, A. Esther P. W. and Hajti{\'c}, Muamer and Chavannes, H. Niels and Versluis, Anke", title="Facilitators of and Barriers to the Use of a Digital Self-Management Service for Diagnostic Testing: Focus Group Study With Potential Users", journal="JMIR Hum Factors", year="2024", month="May", day="10", volume="11", pages="e45115", keywords="eHealth", keywords="usability", keywords="self-management", keywords="diagnostic test service", keywords="diagnostic", keywords="testing", keywords="test service", keywords="perspective", keywords="focus group", keywords="user need", keywords="user testing", keywords="implementation", keywords="qualitative", keywords="test result", keywords="laboratory test", keywords="laboratory result", abstract="Background: Health care lags in digital transformation, despite the potential of technology to improve the well-being of individuals. The COVID-19 pandemic has accelerated the uptake of technology in health care and increased individuals' willingness to perform self-management using technology. A web-based service, Directlab Online, provides consumers with direct digital access to diagnostic test packages, which can digitally support the self-management of health. Objective: This study aims to identify the facilitators, barriers, and needs of Directlab Online, a self-management service for web-based access to diagnostic testing. Methods: A qualitative method was used from a potential user's perspective. The needs and future needs for, facilitators of, and barriers to the use of Directlab Online were evaluated. Semistructured focus group meetings were conducted in 2022. Two focus groups were focused on sexually transmitted infection test packages and 2 were focused on prevention test packages. Data analysis was performed according to the principles of the Framework Method. The Consolidated Framework for Implementation Research was used to categorize the facilitators and barriers. Results: In total, 19 participants, with a mean age of 34.32 (SD 14.70) years, participated in the focus groups. Important barriers were a lack of privacy information, too much and difficult information, and a commercial appearance. Important facilitators were the right amount of information, the right kind of tests, and the involvement of a health care professional. The need for a service such as Directlab Online was to ensure its availability for users' health and to maintain their health. Conclusions: According to the participants, facilitators and barriers were comprehension of the information, the goal of the website, and the overall appearance of the service. Although the service was developed in cocreation with health care professionals and users, the needs did not align. The users preferred understandable and adequate, but not excessive, information. In addition, they preferred other types of tests to be available on the service. For future research, it would be beneficial to focus on cocreation between the involved medical professionals and users to develop, improve, and implement a service such as Directlab Online. ", doi="10.2196/45115", url="https://humanfactors.jmir.org/2024/1/e45115", url="http://www.ncbi.nlm.nih.gov/pubmed/38728071" } @Article{info:doi/10.2196/44406, author="Gheisari, Mehdi and Ghaderzadeh, Mustafa and Li, Huxiong and Taami, Tania and Fern{\'a}ndez-Campusano, Christian and Sadeghsalehi, Hamidreza and Afzaal Abbasi, Aaqif", title="Mobile Apps for COVID-19 Detection and Diagnosis for Future Pandemic Control: Multidimensional Systematic Review", journal="JMIR Mhealth Uhealth", year="2024", month="Feb", day="22", volume="12", pages="e44406", keywords="COVID-19", keywords="detection", keywords="diagnosis", keywords="internet of things", keywords="cloud computing", keywords="mobile applications", keywords="mobile app", keywords="mobile apps", keywords="artificial intelligence: AI", keywords="mobile phone", keywords="smartphone", abstract="Background: In the modern world, mobile apps are essential for human advancement, and pandemic control is no exception. The use of mobile apps and technology for the detection and diagnosis of COVID-19 has been the subject of numerous investigations, although no thorough analysis of COVID-19 pandemic prevention has been conducted using mobile apps, creating a gap. Objective: With the intention of helping software companies and clinical researchers, this study provides comprehensive information regarding the different fields in which mobile apps were used to diagnose COVID-19 during the pandemic. Methods: In this systematic review, 535 studies were found after searching 5 major research databases (ScienceDirect, Scopus, PubMed, Web of Science, and IEEE). Of these, only 42 (7.9\%) studies concerned with diagnosing and detecting COVID-19 were chosen after applying inclusion and exclusion criteria using the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) protocol. Results: Mobile apps were categorized into 6 areas based on the content of these 42 studies: contact tracing, data gathering, data visualization, artificial intelligence (AI)--based diagnosis, rule- and guideline-based diagnosis, and data transformation. Patients with COVID-19 were identified via mobile apps using a variety of clinical, geographic, demographic, radiological, serological, and laboratory data. Most studies concentrated on using AI methods to identify people who might have COVID-19. Additionally, symptoms, cough sounds, and radiological images were used more frequently compared to other data types. Deep learning techniques, such as convolutional neural networks, performed comparatively better in the processing of health care data than other types of AI techniques, which improved the diagnosis of COVID-19. Conclusions: Mobile apps could soon play a significant role as a powerful tool for data collection, epidemic health data analysis, and the early identification of suspected cases. These technologies can work with the internet of things, cloud storage, 5th-generation technology, and cloud computing. Processing pipelines can be moved to mobile device processing cores using new deep learning methods, such as lightweight neural networks. In the event of future pandemics, mobile apps will play a critical role in rapid diagnosis using various image data and clinical symptoms. Consequently, the rapid diagnosis of these diseases can improve the management of their effects and obtain excellent results in treating patients. ", doi="10.2196/44406", url="https://mhealth.jmir.org/2024/1/e44406", url="http://www.ncbi.nlm.nih.gov/pubmed/38231538" } @Article{info:doi/10.2196/54274, author="Gannon, Hannah and Larsson, Leyla and Chimhuya, Simbarashe and Mangiza, Marcia and Wilson, Emma and Kesler, Erin and Chimhini, Gwendoline and Fitzgerald, Felicity and Zailani, Gloria and Crehan, Caroline and Khan, Nushrat and Hull-Bailey, Tim and Sassoon, Yali and Baradza, Morris and Heys, Michelle and Chiume, Msandeni", title="Development and Implementation of Digital Diagnostic Algorithms for Neonatal Units in Zimbabwe and Malawi: Development and Usability Study", journal="JMIR Form Res", year="2024", month="Jan", day="26", volume="8", pages="e54274", keywords="mobile health", keywords="mHealth", keywords="neonatology", keywords="digital health", keywords="mobile apps", keywords="newborn", keywords="Malawi, Zimbabwe", keywords="usability", keywords="clinical decision support", abstract="Background: Despite an increase in hospital-based deliveries, neonatal mortality remains high in low-resource settings. Due to limited laboratory diagnostics, there is significant reliance on clinical findings to inform diagnoses. Accurate, evidence-based identification and management of neonatal conditions could improve outcomes by standardizing care. This could be achieved through digital clinical decision support (CDS) tools. Neotree is a digital, quality improvement platform that incorporates CDS, aiming to improve neonatal care in low-resource health care facilities. Before this study, first-phase CDS development included developing and implementing neonatal resuscitation algorithms, creating initial versions of CDS to address a range of neonatal conditions, and a Delphi study to review key algorithms. Objective: This second-phase study aims to codevelop and implement neonatal digital CDS algorithms in Malawi and Zimbabwe. Methods: Overall, 11 diagnosis-specific web-based workshops with Zimbabwean, Malawian, and UK neonatal experts were conducted (August 2021 to April 2022) encompassing the following: (1) review of available evidence, (2) review of country-specific guidelines (Essential Medicines List and Standard Treatment Guidelinesfor Zimbabwe and Care of the Infant and Newborn, Malawi), and (3) identification of uncertainties within the literature for future studies. After agreement of clinical content, the algorithms were programmed into a test script, tested with the respective hospital's health care professionals (HCPs), and refined according to their feedback. Once finalized, the algorithms were programmed into the Neotree software and implemented at the tertiary-level implementation sites: Sally Mugabe Central Hospital in Zimbabwe and Kamuzu Central Hospital in Malawi, in December 2021 and May 2022, respectively. In Zimbabwe, usability was evaluated through 2 usability workshops and usability questionnaires: Post-Study System Usability Questionnaire (PSSUQ) and System Usability Scale (SUS). Results: Overall, 11 evidence-based diagnostic and management algorithms were tailored to local resource availability. These refined algorithms were then integrated into Neotree. Where national management guidelines differed, country-specific guidelines were created. In total, 9 HCPs attended the usability workshops and completed the SUS, among whom 8 (89\%) completed the PSSUQ. Both usability scores (SUS mean score 75.8 out of 100 [higher score is better]; PSSUQ overall score 2.28 out of 7 [lower score is better]) demonstrated high usability of the CDS function but highlighted issues around technical complexity, which continue to be addressed iteratively. Conclusions: This study describes the successful development and implementation of the only known neonatal CDS system, incorporated within a bedside data capture system with the ability to deliver up-to-date management guidelines, tailored to local resource availability. This study highlighted the importance of collaborative participatory design. Further implementation evaluation is planned to guide and inform the development of health system and program strategies to support newborn HCPs, with the ultimate goal of reducing preventable neonatal morbidity and mortality in low-resource settings. ", doi="10.2196/54274", url="https://formative.jmir.org/2024/1/e54274", url="http://www.ncbi.nlm.nih.gov/pubmed/38277198" } @Article{info:doi/10.2196/52377, author="Ponzo, Sonia and May, Merle and Tamayo-Elizalde, Miren and Bailey, Kerri and Shand, J. Alanna and Bamford, Ryan and Multmeier, Jan and Griessel, Ivan and Szulyovszky, Benedek and Blakey, William and Valentine, Sophie and Plans, David", title="App Characteristics and Accuracy Metrics of Available Digital Biomarkers for Autism: Scoping Review", journal="JMIR Mhealth Uhealth", year="2023", month="Nov", day="17", volume="11", pages="e52377", keywords="autism", keywords="diagnostics", keywords="digital biomarkers", keywords="digital health", keywords="mobile apps", keywords="neurodevelopmental conditions", abstract="Background: Diagnostic delays in autism are common, with the time to diagnosis being up to 3 years from the onset of symptoms. Such delays have a proven detrimental effect on individuals and families going through the process. Digital health products, such as mobile apps, can help close this gap due to their scalability and ease of access. Further, mobile apps offer the opportunity to make the diagnostic process faster and more accurate by providing additional and timely information to clinicians undergoing autism assessments. Objective: The aim of this scoping review was to synthesize the available evidence about digital biomarker tools to aid clinicians, researchers in the autism field, and end users in making decisions as to their adoption within clinical and research settings. Methods: We conducted a structured literature search on databases and search engines to identify peer-reviewed studies and regulatory submissions that describe app characteristics, validation study details, and accuracy and validity metrics of commercial and research digital biomarker apps aimed at aiding the diagnosis of autism. Results: We identified 4 studies evaluating 4 products: 1 commercial and 3 research apps. The accuracy of the identified apps varied between 28\% and 80.6\%. Sensitivity and specificity also varied, ranging from 51.6\% to 81.6\% and 18.5\% to 80.5\%, respectively. Positive predictive value ranged from 20.3\% to 76.6\%, and negative predictive value fluctuated between 48.7\% and 97.4\%. Further, we found a lack of details around participants' demographics and, where these were reported, important imbalances in sex and ethnicity in the studies evaluating such products. Finally, evaluation methods as well as accuracy and validity metrics of available tools were not clearly reported in some cases and varied greatly across studies. Different comparators were also used, with some studies validating their tools against the Diagnostic and Statistical Manual of Mental Disorders criteria and others through self-reported measures. Further, while in most cases, 2 classes were used for algorithm validation purposes, 1 of the studies reported a third category (indeterminate). These discrepancies substantially impact the comparability and generalizability of the results, thus highlighting the need for standardized validation processes and the reporting of findings. Conclusions: Despite their popularity, systematic evaluations and syntheses of the current state of the art of digital health products are lacking. Standardized and transparent evaluations of digital health tools in diverse populations are needed to assess their real-world usability and validity, as well as help researchers, clinicians, and end users safely adopt novel tools within clinical and research practices. ", doi="10.2196/52377", url="https://mhealth.jmir.org/2023/1/e52377", url="http://www.ncbi.nlm.nih.gov/pubmed/37976084" } @Article{info:doi/10.2196/50924, author="Watase, Teruhisa and Omiya, Yasuhiro and Tokuno, Shinichi", title="Severity Classification Using Dynamic Time Warping--Based Voice Biomarkers for Patients With COVID-19: Feasibility Cross-Sectional Study", journal="JMIR Biomed Eng", year="2023", month="Nov", day="6", volume="8", pages="e50924", keywords="voice biomarker", keywords="dynamic time warping", keywords="COVID-19", keywords="smartphone", keywords="severity classification", keywords="biomarker", keywords="feasibility study", keywords="illness", keywords="monitoring", keywords="respiratory disease", keywords="accuracy", keywords="logistic model", keywords="tool", keywords="model", abstract="Background: In Japan, individuals with mild COVID-19 illness previously required to be monitored in designated areas and were hospitalized only if their condition worsened to moderate illness or worse. Daily monitoring using a pulse oximeter was a crucial indicator for hospitalization. However, a drastic increase in the number of patients resulted in a shortage of pulse oximeters for monitoring. Therefore, an alternative and cost-effective method for monitoring patients with mild illness was required. Previous studies have shown that voice biomarkers for Parkinson disease or Alzheimer disease are useful for classifying or monitoring symptoms; thus, we tried to adapt voice biomarkers for classifying the severity of COVID-19 using a dynamic time warping (DTW) algorithm where voice wavelets can be treated as 2D features; the differences between wavelet features are calculated as scores. Objective: This feasibility study aimed to test whether DTW-based indices can generate voice biomarkers for a binary classification model using COVID-19 patients' voices to distinguish moderate illness from mild illness at a significant level. Methods: We conducted a cross-sectional study using voice samples of COVID-19 patients. Three kinds of long vowels were processed into 10-cycle waveforms with standardized power and time axes. The DTW-based indices were generated by all pairs of waveforms and tested with the Mann-Whitney U test ($\alpha$<.01) and verified with a linear discrimination analysis and confusion matrix to determine which indices were better for binary classification of disease severity. A binary classification model was generated based on a generalized linear model (GLM) using the most promising indices as predictors. The receiver operating characteristic curve/area under the curve (ROC/AUC) validated the model performance, and the confusion matrix calculated the model accuracy. Results: Participants in this study (n=295) were infected with COVID-19 between June 2021 and March 2022, were aged 20 years or older, and recuperated in Kanagawa prefecture. Voice samples (n=110) were selected from the participants' attribution matrix based on age group, sex, time of infection, and whether they had mild illness (n=61) or moderate illness (n=49). The DTW-based variance indices were found to be significant (P<.001, except for 1 of 6 indices), with a balanced accuracy in the range between 79\% and 88.6\% for the /a/, /e/, and /u/ vowel sounds. The GLM achieved a high balance accuracy of 86.3\% (for /a/), 80.2\% (for /e/), and 88\% (for /u/) and ROC/AUC of 94.8\% (95\% CI 90.6\%-94.8\%) for /a/, 86.5\% (95\% CI 79.8\%-86.5\%) for /e/, and 95.6\% (95\% CI 92.1\%-95.6\%) for /u/. Conclusions: The proposed model can be a voice biomarker for an alternative and cost-effective method of monitoring the progress of COVID-19 patients in care. ", doi="10.2196/50924", url="https://biomedeng.jmir.org/2023/1/e50924", url="http://www.ncbi.nlm.nih.gov/pubmed/37982072" } @Article{info:doi/10.2196/49995, author="Fraser, Hamish and Crossland, Daven and Bacher, Ian and Ranney, Megan and Madsen, Tracy and Hilliard, Ross", title="Comparison of Diagnostic and Triage Accuracy of Ada Health and WebMD Symptom Checkers, ChatGPT, and Physicians for Patients in an Emergency Department: Clinical Data Analysis Study", journal="JMIR Mhealth Uhealth", year="2023", month="Oct", day="3", volume="11", pages="e49995", keywords="diagnosis", keywords="triage", keywords="symptom checker", keywords="emergency patient", keywords="ChatGPT", keywords="LLM", keywords="diagnose", keywords="self-diagnose", keywords="self-diagnosis", keywords="app", keywords="application", keywords="language model", keywords="accuracy", keywords="ChatGPT-3.5", keywords="ChatGPT-4.0", keywords="emergency", keywords="machine learning", abstract="Background: Diagnosis is a core component of effective health care, but misdiagnosis is common and can put patients at risk. Diagnostic decision support systems can play a role in improving diagnosis by physicians and other health care workers. Symptom checkers (SCs) have been designed to improve diagnosis and triage (ie, which level of care to seek) by patients. Objective: The aim of this study was to evaluate the performance of the new large language model ChatGPT (versions 3.5 and 4.0), the widely used WebMD SC, and an SC developed by Ada Health in the diagnosis and triage of patients with urgent or emergent clinical problems compared with the final emergency department (ED) diagnoses and physician reviews. Methods: We used previously collected, deidentified, self-report data from 40 patients presenting to an ED for care who used the Ada SC to record their symptoms prior to seeing the ED physician. Deidentified data were entered into ChatGPT versions 3.5 and 4.0 and WebMD by a research assistant blinded to diagnoses and triage. Diagnoses from all 4 systems were compared with the previously abstracted final diagnoses in the ED as well as with diagnoses and triage recommendations from three independent board-certified ED physicians who had blindly reviewed the self-report clinical data from Ada. Diagnostic accuracy was calculated as the proportion of the diagnoses from ChatGPT, Ada SC, WebMD SC, and the independent physicians that matched at least one ED diagnosis (stratified as top 1 or top 3). Triage accuracy was calculated as the number of recommendations from ChatGPT, WebMD, or Ada that agreed with at least 2 of the independent physicians or were rated ``unsafe'' or ``too cautious.'' Results: Overall, 30 and 37 cases had sufficient data for diagnostic and triage analysis, respectively. The rate of top-1 diagnosis matches for Ada, ChatGPT 3.5, ChatGPT 4.0, and WebMD was 9 (30\%), 12 (40\%), 10 (33\%), and 12 (40\%), respectively, with a mean rate of 47\% for the physicians. The rate of top-3 diagnostic matches for Ada, ChatGPT 3.5, ChatGPT 4.0, and WebMD was 19 (63\%), 19 (63\%), 15 (50\%), and 17 (57\%), respectively, with a mean rate of 69\% for physicians. The distribution of triage results for Ada was 62\% (n=23) agree, 14\% unsafe (n=5), and 24\% (n=9) too cautious; that for ChatGPT 3.5 was 59\% (n=22) agree, 41\% (n=15) unsafe, and 0\% (n=0) too cautious; that for ChatGPT 4.0 was 76\% (n=28) agree, 22\% (n=8) unsafe, and 3\% (n=1) too cautious; and that for WebMD was 70\% (n=26) agree, 19\% (n=7) unsafe, and 11\% (n=4) too cautious. The unsafe triage rate for ChatGPT 3.5 (41\%) was significantly higher (P=.009) than that of Ada (14\%). Conclusions: ChatGPT 3.5 had high diagnostic accuracy but a high unsafe triage rate. ChatGPT 4.0 had the poorest diagnostic accuracy, but a lower unsafe triage rate and the highest triage agreement with the physicians. The Ada and WebMD SCs performed better overall than ChatGPT. Unsupervised patient use of ChatGPT for diagnosis and triage is not recommended without improvements to triage accuracy and extensive clinical evaluation. ", doi="10.2196/49995", url="https://mhealth.jmir.org/2023/1/e49995", url="http://www.ncbi.nlm.nih.gov/pubmed/37788063" } @Article{info:doi/10.2196/37136, author="Coulibaly, Abou and Kouanda, S{\'e}ni", title="Effects of the Pregnancy and Newborn Diagnostic Assessment (PANDA) App on Antenatal Care Quality in Burkina Faso: Protocol for a Cluster Randomized Controlled Trial", journal="JMIR Res Protoc", year="2023", month="Aug", day="9", volume="12", pages="e37136", keywords="telemedicine", keywords="PANDA", keywords="pregnancy and newborn diagnostic assessment", keywords="quality", keywords="antenatal care", keywords="Burkina Faso", keywords="trial", keywords="pregnancy", keywords="pregnant", keywords="newborn", keywords="diagnostic", keywords="mobile app", keywords="prenatal care", keywords="randomized trial", keywords="first trimester", keywords="postpartum", keywords="qualitative research", keywords="maternity", keywords="prenatal", keywords="antenatal", keywords="mobile phone", abstract="Background: The Pregnancy and Newborn Diagnostic Assessment (PANDA) system is a digital clinical decision support tool that can facilitate diagnosis and decision-making by health care personnel in antenatal care (ANC). Studies conducted in Madagascar and Burkina Faso showed that PANDA is a feasible system acceptable to various stakeholders. Objective: This study primarily aims to evaluate the effects of the PANDA system on ANC quality at rural health facilities in Burkina Faso. The secondary objectives of this study are to test the effects of the PANDA system on women's satisfaction, women's knowledge on birth preparedness and complication readiness, maternal and child health service use, men's involvement in maternal health service utilization, and women's contraception use at 6 weeks postpartum. Further, we will identify the factors that hinder or promote such an app and contribute to cost-effectiveness analysis. Methods: This is a randomized controlled trial implementing the PANDA system in 2 groups of health facilities (intervention and comparison groups) randomized using a matched-pair method. We included pregnant women who were <20 weeks pregnant during their first antenatal consultation in health facilities, and we followed up with them until their sixth week postpartum. Thirteen health centers were included, and 423 and 272 women were enrolled in the intervention and comparison groups, respectively. The primary outcome is a binary variable derived from the quality score, coded 1 (yes) for women with at least 75\% of the total score and 0 if not. Data were collected electronically using tablets by directly interviewing the women and by extracting data from ANC registers, delivery registers, ANC cards, and health care records. The study procedures were standardized across all sites. We will compare unadjusted and adjusted primary outcome results (ANC quality scores) between the 2 study arms. We added a qualitative evaluation of the implementation of the PANDA system to identify barriers and catalysts. We also included an economic evaluation to determine whether the PANDA strategy is more cost-effective than the usual ANC strategy. Results: The enrollment ran from July 2020 to January 2021 due to the COVID-19 pandemic. Data collection ended in September 2022. Data analyses started in January 2023, ended in June 2023, and the results are expected to be published in February 2024. Conclusions: The PANDA system is one of the most comprehensive apps for ANC because it has many features. However, the use of computerized systems for ANC is limited. Therefore, our trial will be beneficial for evaluating the intrinsic capacity of the PANDA system to improve the quality of care. By including qualitative research and economic evaluation, our findings will be significant because electronic consultation registries are expected to be used for maternal health care in the future in Burkina Faso. Trial Registration: Pan-African Clinical Trials Registry (PACTR) PACTR202009861550402; https://pactr.samrc.ac.za/TrialDisplay.aspx?TrialID=12374 International Registered Report Identifier (IRRID): DERR1-10.2196/37136 ", doi="10.2196/37136", url="https://www.researchprotocols.org/2023/1/e37136", url="http://www.ncbi.nlm.nih.gov/pubmed/37556195" } @Article{info:doi/10.2196/42775, author="Verma, Neha and Buch, Bimal and Taralekar, Radha and Acharya, Soumyadipta", title="Diagnostic Concordance of Telemedicine as Compared With Face-to-Face Care in Primary Health Care Clinics in Rural India: Randomized Crossover Trial", journal="JMIR Form Res", year="2023", month="Jun", day="23", volume="7", pages="e42775", keywords="telemedicine", keywords="telehealth", keywords="eHealth", keywords="opensource", keywords="digital assistant", keywords="diagnostic concordance", keywords="COVID-19", keywords="primary care", keywords="rural health", keywords="teleconsultation", keywords="patient care", abstract="Background: With the COVID-19 pandemic, there was an increase and scaling up of provider-to-provider telemedicine programs that connect frontline health providers such as nurses and community health workers at primary care clinics with remote doctors at tertiary facilities to facilitate consultations for rural patients. Considering this new trend of increasing use of telemedicine, this study was conducted to generate evidence for patients, health providers, and policymakers to compare if provider-to-provider telemedicine-based care is equivalent to in-person care and is safe and acceptable in terms of diagnostic and treatment standards. Objective: This study aims to compare the diagnosis and treatment decisions from teleconsultations to those of in-person care in teleclinics in rural Gujarat. Methods: We conducted a diagnostic concordance study using a randomized crossover study design with 104 patients at 10 telemedicine primary care clinics. Patients reporting to 10 telemedicine primary care clinics were randomly assigned to first receive an in-person doctor consultation (59/104, 56.7\%) or to first receive a health worker--assisted telemedicine consultation (45/104, 43.3\%). The 2 groups were then switched, with the first group undergoing a telemedicine consultation following the in-person consultation and the second group receiving an in-person consultation after the teleconsultation. The in-person doctor and remote doctor were blinded to the diagnosis and management plan of the other. The diagnosis and treatment plan of in-person doctors was considered the gold standard. Results: We enrolled 104 patients reporting a range of primary health care issues into the study. We observed 74\% (77/104) diagnostic concordance and 79.8\% (83/104) concordance in the treatment plan between the in-person and remote doctors. No significant association was found between the diagnostic and treatment concordance and the order of the consultation (P=.65 and P=.81, respectively), the frontline health worker--doctor pair (both P=.93), the gender of the patient (both P>.99), or the mode of teleconsultation (synchronous vs asynchronous; P=.32 and P=.29, respectively), as evaluated using Fisher exact tests. A significant association was seen between the diagnostic and treatment concordance and the type of case (P=.004 and P=.03, respectively). The highest diagnostic concordance was seen in the management of hypertension (20/21, 95\% concordance; Cohen kappa=0.93) and diabetes (14/15, 93\% concordance; Cohen kappa=0.89). The lowest values were seen in cardiology (1/3, 33\%) and patients presenting with nonspecific symptoms (3/10, 30\%). The use of a digital assistant to facilitate the consultation resulted in increased adherence to evidence-based care protocols. Conclusions: The findings reflect that telemedicine can be a safe and acceptable alternative mode of care especially in remote rural settings when in-person care is not accessible. Telemedicine has advantages. for the potential gains for improved health care--seeking behavior for patients, reduced costs for the patient, and improved health system efficiency by reducing overcrowding at tertiary health facilities. ", doi="10.2196/42775", url="https://formative.jmir.org/2023/1/e42775", url="http://www.ncbi.nlm.nih.gov/pubmed/37130015" } @Article{info:doi/10.2196/40718, author="Nannini, Simon and Penel, Nicolas and Bompas, Emmanuelle and Willaume, Thibault and Kurtz, Jean-Emmanuel and Gantzer, Justine", title="Shortening the Time Interval for the Referral of Patients With Soft Tissue Sarcoma to Expert Centers Using Mobile Health: Retrospective Study", journal="JMIR Mhealth Uhealth", year="2022", month="Nov", day="9", volume="10", number="11", pages="e40718", keywords="sarcoma", keywords="apps", keywords="mHealth", keywords="mobile health", keywords="health app", keywords="mobile app", keywords="referral", keywords="consultation", keywords="care coordination", keywords="tumor", keywords="cancer", keywords="oncology", keywords="soft tissue", keywords="connective tissue", keywords="prognosis", keywords="communication", keywords="interprofessional", keywords="patient management", keywords="physician", keywords="doctor", keywords="health care provider", keywords="specialist", keywords="general practitioner", keywords="GP", abstract="Background: According to guidelines, all patients with sarcoma must be managed from initial diagnosis at expert sarcoma centers. However, in everyday practice, the time interval to an expert center visit can be long, which delays presentation to an expert multidisciplinary tumor board and increases the risk of inappropriate management, negatively affecting local tumor control and prognosis. The advent of mobile health offers an easy way to facilitate communication and cooperation between general health care providers (eg, general practitioners and radiologists) and sarcomas experts. We developed a mobile app (Sar'Connect) based on the algorithm designed by radiologists from the French Sarcoma Group. Through a small number of easy-to-answer questions, Sar'Connect provides personalized advice for the management of patients and contact information for the closest expert center. Objective: This retrospective study is the first to assess this mobile app's potential benefits in reducing the time interval for patient referral to an expert center according to the initial clinical characteristics of the soft tissue tumor. Methods: From May to December 2021, we extracted tumor mass data for 78 patients discussed by the multidisciplinary tumor boards at 3 centers of the French Sarcoma Group. We applied the Sar'Connect algorithm to these data and estimated the time interval between the first medical description of the soft tissue mass and the referral to expert center. We then compared this estimated time interval with the observed time interval. Results: We found that the use of Sar'Connect could potentially shorten the time interval to an expert center by approximately 7.5 months (P<.001). Moreover, for half (31/60, 52\%) of the patients with a malignant soft tissue tumor, Sar'Connect could have avoided inappropriate management outside of the reference center. We did not identify a significant determinant for shortening the time interval for referral. Conclusions: Overall, promoting the use of a simple mobile app is an innovative and straightforward means to potentially accelerate both the referral and management of patients with soft tissue sarcoma at expert centers. ", doi="10.2196/40718", url="https://mhealth.jmir.org/2022/11/e40718", url="http://www.ncbi.nlm.nih.gov/pubmed/36350680" } @Article{info:doi/10.2196/38364, author="Fraser, F. Hamish S. and Cohan, Gregory and Koehler, Christopher and Anderson, Jared and Lawrence, Alexis and Pate{\~n}a, John and Bacher, Ian and Ranney, L. Megan", title="Evaluation of Diagnostic and Triage Accuracy and Usability of a Symptom Checker in an Emergency Department: Observational Study", journal="JMIR Mhealth Uhealth", year="2022", month="Sep", day="19", volume="10", number="9", pages="e38364", keywords="mobile health", keywords="mHealth", keywords="symptom checker", keywords="diagnosis", keywords="user experience", abstract="Background: Symptom checkers are clinical decision support apps for patients, used by tens of millions of people annually. They are designed to provide diagnostic and triage advice and assist users in seeking the appropriate level of care. Little evidence is available regarding their diagnostic and triage accuracy with direct use by patients for urgent conditions. Objective: The aim of this study is to determine the diagnostic and triage accuracy and usability of a symptom checker in use by patients presenting to an emergency department (ED). Methods: We recruited a convenience sample of English-speaking patients presenting for care in an urban ED. Each consenting patient used a leading symptom checker from Ada Health before the ED evaluation. Diagnostic accuracy was evaluated by comparing the symptom checker's diagnoses and those of 3 independent emergency physicians viewing the patient-entered symptom data, with the final diagnoses from the ED evaluation. The Ada diagnoses and triage were also critiqued by the independent physicians. The patients completed a usability survey based on the Technology Acceptance Model. Results: A total of 40 (80\%) of the 50 participants approached completed the symptom checker assessment and usability survey. Their mean age was 39.3 (SD 15.9; range 18-76) years, and they were 65\% (26/40) female, 68\% (27/40) White, 48\% (19/40) Hispanic or Latino, and 13\% (5/40) Black or African American. Some cases had missing data or a lack of a clear ED diagnosis; 75\% (30/40) were included in the analysis of diagnosis, and 93\% (37/40) for triage. The sensitivity for at least one of the final ED diagnoses by Ada (based on its top 5 diagnoses) was 70\% (95\% CI 54\%-86\%), close to the mean sensitivity for the 3 physicians (on their top 3 diagnoses) of 68.9\%. The physicians rated the Ada triage decisions as 62\% (23/37) fully agree and 24\% (9/37) safe but too cautious. It was rated as unsafe and too risky in 22\% (8/37) of cases by at least one physician, in 14\% (5/37) of cases by at least two physicians, and in 5\% (2/37) of cases by all 3 physicians. Usability was rated highly; participants agreed or strongly agreed with the 7 Technology Acceptance Model usability questions with a mean score of 84.6\%, although ``satisfaction'' and ``enjoyment'' were rated low. Conclusions: This study provides preliminary evidence that a symptom checker can provide acceptable usability and diagnostic accuracy for patients with various urgent conditions. A total of 14\% (5/37) of symptom checker triage recommendations were deemed unsafe and too risky by at least two physicians based on the symptoms recorded, similar to the results of studies on telephone and nurse triage. Larger studies are needed of diagnosis and triage performance with direct patient use in different clinical environments. ", doi="10.2196/38364", url="https://mhealth.jmir.org/2022/9/e38364", url="http://www.ncbi.nlm.nih.gov/pubmed/36121688" } @Article{info:doi/10.2196/36872, author="Strutz, Nicole and Brodowski, Hanna and Kiselev, Joern and Heimann-Steinert, Anika and M{\"u}ller-Werdan, Ursula", title="App-Based Evaluation of Older People's Fall Risk Using the mHealth App Lindera Mobility Analysis: Exploratory Study", journal="JMIR Aging", year="2022", month="Aug", day="16", volume="5", number="3", pages="e36872", keywords="mobility", keywords="fall risk", keywords="smartphone", keywords="app", keywords="analysis", keywords="older people", keywords="accuracy", keywords="mobility restriction", abstract="Background: Falls and the risk of falling in older people pose a high risk for losing independence. As the risk of falling progresses over time, it is often not adequately diagnosed due to the long intervals between contacts with health care professionals. This leads to the risk of falling being not properly detected until the first fall. App-based software able to screen fall risks of older adults and to monitor the progress and presence of fall risk factors could detect a developing fall risk at an early stage prior to the first fall. As smartphones become more common in the elderly population, this approach is easily available and feasible. Objective: The aim of the study is to evaluate the app Lindera Mobility Analysis (LIN). The reference standards determined the risk of falling and validated functional assessments of mobility. Methods: The LIN app was utilized in home- and community-dwelling older adults aged 65 years or more. The Berg Balance Scale (BBS), the Tinetti Test (TIN), and the Timed Up \& Go Test (TUG) were used as reference standards. In addition to descriptive statistics, data correlation and the comparison of the mean difference of analog measures (reference standards) and digital measures were tested. Spearman rank correlation analysis was performed and Bland-Altman (B-A) plots drawn. Results: Data of 42 participants could be obtained (n=25, 59.5\%, women). There was a significant correlation between the LIN app and the BBS (r=--0.587, P<.001), TUG (r=0.474, P=.002), and TIN (r=--0.464, P=.002). B-A plots showed only few data points outside the predefined limits of agreement (LOA) when combining functional tests and results of LIN. Conclusions: The digital app LIN has the potential to detect the risk of falling in older people. Further steps in establishing the validity of the LIN app should include its clinical applicability. Trial Registration: German Clinical Trials Register DRKS00025352; https://tinyurl.com/65awrd6a ", doi="10.2196/36872", url="https://aging.jmir.org/2022/3/e36872", url="http://www.ncbi.nlm.nih.gov/pubmed/35972785" } @Article{info:doi/10.2196/34685, author="LeRouge, Cynthia and Durneva, Polina and Lyon, Victoria and Thompson, Matthew", title="Health Consumer Engagement, Enablement, and Empowerment in Smartphone-Enabled Home-Based Diagnostic Testing for Viral Infections: Mixed Methods Study", journal="JMIR Mhealth Uhealth", year="2022", month="Jun", day="30", volume="10", number="6", pages="e34685", keywords="smart HT", keywords="mHealth", keywords="patient engagement", keywords="patient enablement", keywords="patient empowerment", keywords="diagnostic testing", keywords="viral infection", keywords="patient activation", keywords="consumer health informatics", keywords="influenza", keywords="home testing", keywords="mobile phone", abstract="Background: Health consumers are increasingly taking a more substantial role in decision-making and self-care regarding their health. A range of digital technologies is available for laypeople to find, share, and generate health-related information that supports their health care processes. There is also innovation and interest in home testing enabled by smartphone technology (smartphone-supported home testing [smart HT]). However, few studies have focused on the process from initial engagement to acting on the test results, which involves multiple decisions. Objective: This study aimed to identify and model the key factors leading to health consumers' engagement and enablement associated with smart HT. We also explored multiple levels of health care choices resulting from health consumer empowerment and activation from smart HT use. Understanding the factors and choices associated with engagement, enablement, empowerment, and activation helps both research and practice to support the intended and optimal use of smart HT. Methods: This study reports the findings from 2 phases of a more extensive pilot study of smart HT for viral infection. In these 2 phases, we used mixed methods (semistructured interviews and surveys) to shed light on the situated complexities of health consumers making autonomous decisions to engage with, perform, and act on smart HT, supporting the diagnostic aspects of their health care. Interview (n=31) and survey (n=282) participants underwent smart HT testing for influenza in earlier pilot phases. The survey also extended the viral infection context to include questions related to potential smart HT use for SARS-CoV-2 diagnosis. Results: Our resulting model revealed the smart HT engagement and enablement factors, as well as choices resulting from empowerment and activation. The model included factors leading to engagement, specifically various intrinsic and extrinsic influences. Moreover, the model included various enablement factors, including the quality of smart HT and the personal capacity to perform smart HT. The model also explores various choices resulting from empowerment and activation from the perspectives of various stakeholders (public vs private) and concerning different levels of impact (personal vs distant). Conclusions: The findings provide insight into the nuanced and complex ways health consumers make decisions to engage with and perform smart HT and how they may react to positive results in terms of public-private and personal-distant dimensions. Moreover, the study illuminates the role that providers and smart HT sources can play to better support digitally engaged health consumers in the smart HT decision process. ", doi="10.2196/34685", url="https://mhealth.jmir.org/2022/6/e34685", url="http://www.ncbi.nlm.nih.gov/pubmed/35771605" } @Article{info:doi/10.2196/37970, author="Kwon, Soonil and Lee, So-Ryoung and Choi, Eue-Keun and Ahn, Hyo-Jeong and Song, Hee-Seok and Lee, Young-Shin and Oh, Seil and Lip, H. Gregory Y.", title="Comparison Between the 24-hour Holter Test and 72-hour Single-Lead Electrocardiogram Monitoring With an Adhesive Patch-Type Device for Atrial Fibrillation Detection: Prospective Cohort Study", journal="J Med Internet Res", year="2022", month="May", day="9", volume="24", number="5", pages="e37970", keywords="atrial fibrillation", keywords="diagnosis", keywords="electrocardiogram", keywords="wearable device", keywords="health monitoring", keywords="Holter", keywords="cardiac", keywords="arrhythmia", keywords="ECG", keywords="EKG", keywords="digital tool", keywords="cardiology", keywords="patient monitoring", keywords="outpatient clinic", keywords="cardiac health", keywords="diagnostic", keywords="patient", keywords="clinician", keywords="digital health", abstract="Background: There is insufficient evidence for the use of single-lead electrocardiogram (ECG) monitoring with an adhesive patch-type device (APD) over an extended period compared to that of the 24-hour Holter test for atrial fibrillation (AF) detection. Objective: In this paper, we aimed to compare AF detection by the 24-hour Holter test and 72-hour single-lead ECG monitoring using an APD among patients with AF. Methods: This was a prospective, single-center cohort study. A total of 210 patients with AF with clinical indications for the Holter test at cardiology outpatient clinics were enrolled in the study. The study participants were equipped with both the Holter device and APD for the first 24 hours. Subsequently, only the APD continued ECG monitoring for an additional 48 hours. AF detection during the first 24 hours was compared between the two devices. The diagnostic benefits of extended monitoring using the APD were evaluated. Results: A total of 200 patients (mean age 60 years; n=141, 70.5\% male; and n=59, 29.5\% female) completed 72-hour ECG monitoring with the APD. During the first 24 hours, both monitoring methods detected AF in the same 40/200 (20\%) patients (including 20 patients each with paroxysmal and persistent AF). Compared to the 24-hour Holter test, the APD increased the AF detection rate by 1.5-fold (58/200; 29\%) and 1.6-fold (64/200; 32\%) with 48- and 72-hour monitoring, respectively. With the APD, the number of newly discovered patients with paroxysmal AF was 20/44 (45.5\%), 18/44 (40.9\%), and 6/44 (13.6\%) at 24-, 48-, and 72-hour monitoring, respectively. Compared with 24-hour Holter monitoring, 72-hour monitoring with the APD increased the detection rate of paroxysmal AF by 2.2-fold (44/20). Conclusions: Compared to the 24-hour Holter test, AF detection could be improved with 72-hour single-lead ECG monitoring with the APD. ", doi="10.2196/37970", url="https://www.jmir.org/2022/5/e37970", url="http://www.ncbi.nlm.nih.gov/pubmed/35532989" } @Article{info:doi/10.2196/36977, author="Ramachandram, Dhanesh and Ramirez-GarciaLuna, Luis Jose and Fraser, J. Robert D. and Mart{\'i}nez-Jim{\'e}nez, Aurelio Mario and Arriaga-Caballero, E. Jesus and Allport, Justin", title="Fully Automated Wound Tissue Segmentation Using Deep Learning on Mobile Devices: Cohort Study", journal="JMIR Mhealth Uhealth", year="2022", month="Apr", day="22", volume="10", number="4", pages="e36977", keywords="wound", keywords="tissue segmentation", keywords="automated tissue identification", keywords="deep learning", keywords="mobile imaging", keywords="mobile phone", abstract="Background: Composition of tissue types within a wound is a useful indicator of its healing progression. Tissue composition is clinically used in wound healing tools (eg, Bates-Jensen Wound Assessment Tool) to assess risk and recommend treatment. However, wound tissue identification and the estimation of their relative composition is highly subjective. Consequently, incorrect assessments could be reported, leading to downstream impacts including inappropriate dressing selection, failure to identify wounds at risk of not healing, or failure to make appropriate referrals to specialists. Objective: This study aimed to measure inter- and intrarater variability in manual tissue segmentation and quantification among a cohort of wound care clinicians and determine if an objective assessment of tissue types (ie, size and amount) can be achieved using deep neural networks. Methods: A data set of 58 anonymized wound images of various types of chronic wounds from Swift Medical's Wound Database was used to conduct the inter- and intrarater agreement study. The data set was split into 3 subsets with 50\% overlap between subsets to measure intrarater agreement. In this study, 4 different tissue types (epithelial, granulation, slough, and eschar) within the wound bed were independently labeled by the 5 wound clinicians at 1-week intervals using a browser-based image annotation tool. In addition, 2 deep convolutional neural network architectures were developed for wound segmentation and tissue segmentation and were used in sequence in the workflow. These models were trained using 465,187 and 17,000 image-label pairs, respectively. This is the largest and most diverse reported data set used for training deep learning models for wound and wound tissue segmentation. The resulting models offer robust performance in diverse imaging conditions, are unbiased toward skin tones, and could execute in near real time on mobile devices. Results: A poor to moderate interrater agreement in identifying tissue types in chronic wound images was reported. A very poor Krippendorff $\alpha$ value of .014 for interrater variability when identifying epithelization was observed, whereas granulation was most consistently identified by the clinicians. The intrarater intraclass correlation (3,1), however, indicates that raters were relatively consistent when labeling the same image multiple times over a period. Our deep learning models achieved a mean intersection over union of 0.8644 and 0.7192 for wound and tissue segmentation, respectively. A cohort of wound clinicians, by consensus, rated 91\% (53/58) of the tissue segmentation results to be between fair and good in terms of tissue identification and segmentation quality. Conclusions: The interrater agreement study validates that clinicians exhibit considerable variability when identifying and visually estimating wound tissue proportion. The proposed deep learning technique provides objective tissue identification and measurements to assist clinicians in documenting the wound more accurately and could have a significant impact on wound care when deployed at scale. ", doi="10.2196/36977", url="https://mhealth.jmir.org/2022/4/e36977", url="http://www.ncbi.nlm.nih.gov/pubmed/35451982" } @Article{info:doi/10.2196/36825, author="Ye, Siao and Sun, Kevin and Huynh, Duong and Phi, Q. Huy and Ko, Brian and Huang, Bin and Hosseini Ghomi, Reza", title="A Computerized Cognitive Test Battery for Detection of Dementia and Mild Cognitive Impairment: Instrument Validation Study", journal="JMIR Aging", year="2022", month="Apr", day="15", volume="5", number="2", pages="e36825", keywords="cognitive test", keywords="mild cognitive impairment", keywords="dementia", keywords="cognitive decline", keywords="repeatable battery", keywords="discriminant analysis", abstract="Background: Early detection of dementia is critical for intervention and care planning but remains difficult. Computerized cognitive testing provides an accessible and promising solution to address these current challenges. Objective: The aim of this study was to evaluate a computerized cognitive testing battery (BrainCheck) for its diagnostic accuracy and ability to distinguish the severity of cognitive impairment. Methods: A total of 99 participants diagnosed with dementia, mild cognitive impairment (MCI), or normal cognition (NC) completed the BrainCheck battery. Statistical analyses compared participant performances on BrainCheck based on their diagnostic group. Results: BrainCheck battery performance showed significant differences between the NC, MCI, and dementia groups, achieving 88\% or higher sensitivity and specificity (ie, true positive and true negative rates) for separating dementia from NC, and 77\% or higher sensitivity and specificity in separating the MCI group from the NC and dementia groups. Three-group classification found true positive rates of 80\% or higher for the NC and dementia groups and true positive rates of 64\% or higher for the MCI group. Conclusions: BrainCheck was able to distinguish between diagnoses of dementia, MCI, and NC, providing a potentially reliable tool for early detection of cognitive impairment. ", doi="10.2196/36825", url="https://aging.jmir.org/2022/2/e36825", url="http://www.ncbi.nlm.nih.gov/pubmed/35436212" } @Article{info:doi/10.2196/30724, author="Spadaro, Benedetta and Martin-Key, A. Nayra and Funnell, Erin and Bahn, Sabine", title="mHealth Solutions for Perinatal Mental Health: Scoping Review and Appraisal Following the mHealth Index and Navigation Database Framework", journal="JMIR Mhealth Uhealth", year="2022", month="Jan", day="17", volume="10", number="1", pages="e30724", keywords="digital mental health", keywords="perinatal mental health", keywords="pregnancy", keywords="MIND", keywords="mobile phone", abstract="Background: The ever-increasing pressure on health care systems has resulted in the underrecognition of perinatal mental disorders. Digital mental health tools such as apps could provide an option for accessible perinatal mental health screening and assessment. However, there is a lack of information regarding the availability and features of perinatal app options. Objective: This study aims to evaluate the current state of diagnostic and screening apps for perinatal mental health available on the Google Play Store (Android) and Apple App Store (iOS) and to review their features following the mHealth Index and Navigation Database framework. Methods: Following a scoping review approach, the Apple App Store and Google Play Store were systematically searched to identify perinatal mental health assessment apps. A total of 14 apps that met the inclusion criteria were downloaded and reviewed in a standardized manner using the mHealth Index and Navigation Database framework. The framework comprised 107 questions, allowing for a comprehensive assessment of app origin, functionality, engagement features, security, and clinical use. Results: Most apps were developed by for-profit companies (n=10), followed by private individuals (n=2) and trusted health care companies (n=2). Out of the 14 apps, 3 were available only on Android devices, 4 were available only on iOS devices, and 7 were available on both platforms. Approximately one-third of the apps (n=5) had been updated within the last 180 days. A total of 12 apps offered the Edinburgh Postnatal Depression Scale in its original version or in rephrased versions. Engagement, input, and output features included reminder notifications, connections to therapists, and free writing features. A total of 6 apps offered psychoeducational information and references. Privacy policies were available for 11 of the 14 apps, with a median Flesch-Kincaid reading grade level of 12.3. One app claimed to be compliant with the Health Insurance Portability and Accountability Act standards and 2 apps claimed to be compliant with General Data Protection Regulation. Of the apps that could be accessed in full (n=10), all appeared to fulfill the claims stated in their description. Only 1 app referenced a relevant peer-reviewed study. All the apps provided a warning for use, highlighting that the mental health assessment result should not be interpreted as a diagnosis or as a substitute for medical care. Only 3 apps allowed users to export or email their mental health test results. Conclusions: These results indicate that there are opportunities to improve perinatal mental health assessment apps. To this end, we recommend focusing on the development and validation of more comprehensive assessment tools, ensuring data protection and safety features are adequate for the intended app use, and improving data sharing features between users and health care professionals for timely support. ", doi="10.2196/30724", url="https://mhealth.jmir.org/2022/1/e30724", url="http://www.ncbi.nlm.nih.gov/pubmed/35037894" } @Article{info:doi/10.2196/31541, author="Lowe, Cabella and Hanuman Sing, Harry and Marsh, William and Morrissey, Dylan", title="Validation of a Musculoskeletal Digital Assessment Routing Tool: Protocol for a Pilot Randomized Crossover Noninferiority Trial", journal="JMIR Res Protoc", year="2021", month="Dec", day="13", volume="10", number="12", pages="e31541", keywords="mHealth", keywords="mobile health", keywords="eHealth", keywords="digital health", keywords="digital technology", keywords="musculoskeletal", keywords="triage", keywords="physiotherapy triage", keywords="validation", keywords="mobile phone", abstract="Background: Musculoskeletal conditions account for 16\% of global disability, resulting in a negative effect on millions of patients and an increasing demand for health care use. Digital technologies to improve health care outcomes and efficiency are considered a priority; however, innovations are rarely tested with sufficient rigor in clinical trials, which is the gold standard for clinical proof of safety and efficacy. We have developed a new musculoskeletal digital assessment routing tool (DART) that allows users to self-assess and be directed to the right care. DART requires validation in a real-world setting before implementation. Objective: This pilot study aims to assess the feasibility of a future trial by exploring the key aspects of trial methodology, assessing the procedures, and collecting exploratory data to inform the design of a definitive randomized crossover noninferiority trial to assess DART safety and effectiveness. Methods: We will collect data from 76 adults with a musculoskeletal condition presenting to general practitioners within a National Health Service (NHS) in England. Participants will complete both a DART assessment and a physiotherapist-led triage, with the order determined by randomization. The primary analysis will involve an absolute agreement intraclass correlation (A,1) estimate with 95\% CI between DART and the clinician for assessment outcomes signposting to condition management pathways. Data will be collected to allow the analysis of participant recruitment and retention, randomization, allocation concealment, blinding, data collection process, and bias. In addition, the impact of trial burden and potential barriers to intervention delivery will be considered. The DART user satisfaction will be measured using the system usability scale. Results: A UK NHS ethics submission was done during June 2021 and is pending approval; recruitment will commence in early 2022, with data collection anticipated to last for 3 months. The results will be reported in a follow-up paper in 2022. Conclusions: This study will inform the design of a randomized controlled crossover noninferiority study that will provide evidence concerning mobile health DART system clinical signposting in an NHS setting before real-world implementation. Success should produce evidence of a safe, effective system with good usability, potentially facilitating quicker and easier patient access to appropriate care while reducing the burden on primary and secondary care musculoskeletal services. This rigorous approach to mobile health system testing could be used as a guide for other developers of similar applications. Trial Registration: ClinicalTrials.gov NCT04904029; http://clinicaltrials.gov/ct2/show/NCT04904029 International Registered Report Identifier (IRRID): PRR1-10.2196/31541 ", doi="10.2196/31541", url="https://www.researchprotocols.org/2021/12/e31541", url="http://www.ncbi.nlm.nih.gov/pubmed/34898461" } @Article{info:doi/10.2196/26480, author="Marley, Gifty and Fu, Gengfeng and Zhang, Ye and Li, Jianjun and Tucker, D. Joseph and Tang, Weiming and Yu, Rongbin", title="Willingness of Chinese Men Who Have Sex With Men to Use Smartphone-Based Electronic Readers for HIV Self-testing: Web-Based Cross-sectional Study", journal="J Med Internet Res", year="2021", month="Nov", day="19", volume="23", number="11", pages="e26480", keywords="smartphone-based electronic reader", keywords="electronic readers", keywords="HIV self-testing", keywords="HIVST", keywords="self-testing", keywords="cellular phone--based readers", keywords="mHealth", abstract="Background: The need for strategies to encourage user-initiated reporting of results after HIV self-testing (HIVST) persists. Smartphone-based electronic readers (SERs) have been shown capable of reading diagnostics results accurately in point-of-care diagnostics and could bridge the current gaps between HIVST and linkage to care. Objective: Our study aimed to assess the willingness of Chinese men who have sex with men (MSM) in the Jiangsu province to use an SER for HIVST through a web-based cross-sectional study. Methods: From February to April 2020, we conducted a convenience web-based survey among Chinese MSM by using a pretested structured questionnaire. Survey items were adapted from previous HIVST feasibility studies and modified as required. Prior to answering reader-related questions, participants watched a video showcasing a prototype SER. Statistical analysis included descriptive analysis, chi-squared test, and multivariable logistic regression. P values less than .05 were deemed statistically significant. Results: Of 692 participants, 369 (53.3\%) were aged 26-40 years, 456 (65.9\%) had ever self-tested for HIV, and 493 (71.2\%) were willing to use an SER for HIVST. Approximately 98\% (483/493) of the willing participants, 85.3\% (459/538) of ever self-tested and never self-tested, and 40\% (46/115) of unwilling participants reported that SERs would increase their HIVST frequency. Engaging in unprotected anal intercourse with regular partners compared to consistently using condoms (adjusted odds ratio [AOR] 3.04, 95\% CI 1.19-7.74) increased the odds of willingness to use an SER for HIVST. Participants who had ever considered HIVST at home with a partner right before sex compared to those who had not (AOR 2.99, 95\% CI 1.13-7.90) were also more willing to use an SER for HIVST. Playing receptive roles during anal intercourse compared to playing insertive roles (AOR 0.05, 95\% CI 0.02-0.14) was associated with decreased odds of being willing to use an SER for HIVST. The majority of the participants (447/608, 73.5\%) preferred to purchase readers from local Centers of Disease Control and Prevention offices and 51.2\% (311/608) of the participants were willing to pay less than US \$4.70 for a reader device. Conclusions: The majority of the Chinese MSM, especially those with high sexual risk behaviors, were willing to use an SER for HIVST. Many MSM were also willing to self-test more frequently for HIV with an SER. Further research is needed to ascertain the diagnostic and real-time data-capturing capacity of prototype SERs during HIVST. ", doi="10.2196/26480", url="https://www.jmir.org/2021/11/e26480", url="http://www.ncbi.nlm.nih.gov/pubmed/34806988" } @Article{info:doi/10.2196/32345, author="Xiao, Jin and Meyerowitz, Cyril and Ragusa, Patricia and Funkhouser, Kimberly and Lischka, R. Tamara and Mendez Chagoya, Alberto Luis and Al Jallad, Nisreen and Wu, Tong Tong and Fiscella, Kevin and Ivie, Eden and Strange, Michelle and Collins, Jamie and Kopycka-Kedzierawski, T. Dorota and ", title="Assessment of an Innovative Mobile Dentistry eHygiene Model Amid the COVID-19 Pandemic in the National Dental Practice--Based Research Network: Protocol for Design, Implementation, and Usability Testing", journal="JMIR Res Protoc", year="2021", month="Oct", day="26", volume="10", number="10", pages="e32345", keywords="teledentistry", keywords="mDentistry", keywords="oral diseases", keywords="virtual visit", keywords="intraoral camera", keywords="pandemic response", keywords="COVID-19", keywords="mHealth", abstract="Background: Amid COVID-19, and other possible future infectious disease pandemics, dentistry needs to consider modified dental examination regimens that render quality care, are cost effective, and ensure the safety of patients and dental health care personnel (DHCP). Traditional dental examinations, which number more than 300 million per year in the United States, rely on person-to-person tactile examinations, pose challenges to infection control, and consume large quantities of advanced-level personal protective equipment (PPE). Therefore, our long-term goal is to develop an innovative mobile dentistry (mDent) model that takes these issues into account. This model supplements the traditional dental practice with virtual visits, supported by mobile devices such as mobile telephones, tablets, and wireless infrastructure. The mDent model leverages the advantages of digital mobile health (mHealth) tools such as intraoral cameras to deliver virtual oral examinations, treatment planning, and interactive oral health management, on a broad population basis. Conversion of the traditional dental examinations to mDent virtual examinations builds upon (1) the reliability of teledentistry, which uses intraoral photos and live videos to make diagnostic decisions, and (2) rapid advancement in mHealth tool utilization. Objective: In this pilot project, we designed a 2-stage implementation study to assess 2 critical components of the mDent model: virtual hygiene examination (eHygiene) and patient self-taken intraoral photos (SELFIE). Our specific aims are to (1) assess the acceptance and barriers of mDent eHygiene among patients and DHCP, (2) assess the economic impact of mDent eHygiene, and (3) assess the patient's capability to generate intraoral photos using mHealth tools (exploratory aim, SELFIE). Methods: This study will access the rich resources of the National Dental Practice-Based Research Network to recruit 12 dentists, 12 hygienists, and 144 patients from 12 practices. For aims 1 and 2, we will use role-specific questionnaires to collect quantitative data on eHygiene acceptance and economic impact. The questionnaire components include participant characteristics, the System Usability Scale, a dentist-patient communication scale, practice operation cost, and patient opportunity cost. We will further conduct a series of iterative qualitative research activities using individual interviews to further elicit feedback and suggestion for changes to the mDent eHygiene model. For aim 3, we will use mixed methods (quantitative and qualitative) to assess the patient's capability of taking intraoral photos, by analyzing obtained photos and recorded videos. Results: The study is supported by the US National Institute of Dental and Craniofacial Research. This study received ``single'' institutional review board approval in August 2021. Data collection and analysis are expected to conclude by December 2021 and March 2022, respectively. Conclusions: The study results will inform the logistics of conducting virtual dental examinations and empowering patients with mHealth tools, providing better safety and preserving PPE amid the COVID-19 and possible future pandemics. International Registered Report Identifier (IRRID): PRR1-10.2196/32345 ", doi="10.2196/32345", url="https://www.researchprotocols.org/2021/10/e32345", url="http://www.ncbi.nlm.nih.gov/pubmed/34597259" } @Article{info:doi/10.2196/31862, author="Popescu, Christina and Golden, Grace and Benrimoh, David and Tanguay-Sela, Myriam and Slowey, Dominique and Lundrigan, Eryn and Williams, J{\'e}r{\^o}me and Desormeau, Bennet and Kardani, Divyesh and Perez, Tamara and Rollins, Colleen and Israel, Sonia and Perlman, Kelly and Armstrong, Caitrin and Baxter, Jacob and Whitmore, Kate and Fradette, Marie-Jeanne and Felcarek-Hope, Kaelan and Soufi, Ghassen and Fratila, Robert and Mehltretter, Joseph and Looper, Karl and Steiner, Warren and Rej, Soham and Karp, F. Jordan and Heller, Katherine and Parikh, V. Sagar and McGuire-Snieckus, Rebecca and Ferrari, Manuela and Margolese, Howard and Turecki, Gustavo", title="Evaluating the Clinical Feasibility of an Artificial Intelligence--Powered, Web-Based Clinical Decision Support System for the Treatment of Depression in Adults: Longitudinal Feasibility Study", journal="JMIR Form Res", year="2021", month="Oct", day="25", volume="5", number="10", pages="e31862", keywords="clinical decision support system", keywords="major depressive disorder", keywords="artificial intelligence", keywords="feasibility", keywords="usability", keywords="mobile phone", abstract="Background: Approximately two-thirds of patients with major depressive disorder do not achieve remission during their first treatment. There has been increasing interest in the use of digital, artificial intelligence--powered clinical decision support systems (CDSSs) to assist physicians in their treatment selection and management, improving the personalization and use of best practices such as measurement-based care. Previous literature shows that for digital mental health tools to be successful, the tool must be easy for patients and physicians to use and feasible within existing clinical workflows. Objective: This study aims to examine the feasibility of an artificial intelligence--powered CDSS, which combines the operationalized 2016 Canadian Network for Mood and Anxiety Treatments guidelines with a neural network--based individualized treatment remission prediction. Methods: Owing to the COVID-19 pandemic, the study was adapted to be completed entirely remotely. A total of 7 physicians recruited outpatients diagnosed with major depressive disorder according to the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition criteria. Patients completed a minimum of one visit without the CDSS (baseline) and 2 subsequent visits where the CDSS was used by the physician (visits 1 and 2). The primary outcome of interest was change in appointment length after the introduction of the CDSS as a proxy for feasibility. Feasibility and acceptability data were collected through self-report questionnaires and semistructured interviews. Results: Data were collected between January and November 2020. A total of 17 patients were enrolled in the study; of the 17 patients, 14 (82\%) completed the study. There was no significant difference in appointment length between visits (introduction of the tool did not increase appointment length; F2,24=0.805; mean squared error 58.08; P=.46). In total, 92\% (12/13) of patients and 71\% (5/7) of physicians felt that the tool was easy to use; 62\% (8/13) of patients and 71\% (5/7) of physicians rated that they trusted the CDSS. Of the 13 patients, 6 (46\%) felt that the patient-clinician relationship significantly or somewhat improved, whereas 7 (54\%) felt that it did not change. Conclusions: Our findings confirm that the integration of the tool does not significantly increase appointment length and suggest that the CDSS is easy to use and may have positive effects on the patient-physician relationship for some patients. The CDSS is feasible and ready for effectiveness studies. Trial Registration: ClinicalTrials.gov NCT04061642; http://clinicaltrials.gov/ct2/show/NCT04061642 ", doi="10.2196/31862", url="https://formative.jmir.org/2021/10/e31862", url="http://www.ncbi.nlm.nih.gov/pubmed/34694234" } @Article{info:doi/10.2196/25777, author="Chirambo, Baxter Griphin and Thompson, Matthew and Hardy, Victoria and Ide, Nicole and Hwang, H. Phillip and Dharmayat, Kanika and Mastellos, Nikolaos and Heavin, Ciara and O'Connor, Yvonne and Muula, S. Adamson and Andersson, Bo and Carlsson, Sven and Tran, Tammy and Hsieh, Chen-Ling Jenny and Lee, Hsin-Yi and Fitzpatrick, Annette and Joseph Wu, Tsung-Shu and O'Donoghue, John", title="Effectiveness of Smartphone-Based Community Case Management on the Urgent Referral, Reconsultation, and Hospitalization of Children Aged Under 5 Years in Malawi: Cluster-Randomized, Stepped-Wedge Trial", journal="J Med Internet Res", year="2021", month="Oct", day="20", volume="23", number="10", pages="e25777", keywords="community case management", keywords="mobile health", keywords="pediatrics", keywords="childhood infection", keywords="mobile phone", abstract="Background: Integrated community case management (CCM) has led to reductions in child mortality in Malawi resulting from illnesses such as malaria, pneumonia, and diarrhea. However, adherence to CCM guidelines is often poor, potentially leading to inappropriate clinical decisions and poor outcomes. We determined the impact of an e-CCM app on the referral, reconsultation, and hospitalization rates of children presenting to village clinics in Malawi. Objective: We determined the impact of an electronic version of a smartphone-based CCM (e-CCM) app on the referral, reconsultation, and hospitalization rates of children presenting to village clinics in Malawi. Methods: We used a stepped-wedge, cluster-randomized trial to compare paper-based CCM (control) with and without the use of an e-CCM app on smartphones from November 2016 to February 2017. A total of 102 village clinics from 2 districts in northern Malawi were assigned to 1 of 6 clusters, which were randomized on the sequencing of the crossover from the control phase to the intervention phase as well as the duration of exposure in each phase. Children aged ?2 months to <5 years who presented with acute illness were enrolled consecutively by health surveillance assistants. The primary outcome of urgent referrals to higher-level facilities was evaluated by using multilevel mixed effects models. A logistic regression model with the random effects of the cluster and the fixed effects for each step was fitted. The adjustment for potential confounders included baseline factors, such as patient age, sex, and the geographical location of the village clinics. Calendar time was adjusted for in the analysis. Results: A total of 6965 children were recruited---49.11\% (3421/6965) in the control phase and 50.88\% (3544/6965) in the intervention phase. After adjusting for calendar time, children in the intervention phase were more likely to be urgently referred to a higher-level health facility than children in the control phase (odds ratio [OR] 2.02, 95\% CI 1.27-3.23; P=.003). Overall, children in the intervention arm had lower odds of attending a repeat health surveillance assistant consultation (OR 0.45, 95\% CI 0.34-0.59; P<.001) or being admitted to a hospital (OR 0.75, 95\% CI 0.62-0.90; P=.002), but after adjusting for time, these differences were not significant (P=.07 for consultation; P=.30 for hospital admission). Conclusions: The addition of e-CCM decision support by using smartphones led to a greater proportion of children being referred to higher-level facilities, with no apparent increase in hospital admissions or repeat consultations in village clinics. Our findings provide support for the implementation of e-CCM tools in Malawi and other low- and middle-income countries with a need for ongoing assessments of effectiveness and integration with national digital health strategies. Trial Registration: ClinicalTrials.gov NCT02763345; https://clinicaltrials.gov/ct2/show/NCT02763345 ", doi="10.2196/25777", url="https://www.jmir.org/2021/10/e25777", url="http://www.ncbi.nlm.nih.gov/pubmed/34668872" } @Article{info:doi/10.2196/26602, author="Nida, Kedir Esmael and Bekele, Sisay and Geurts, Luc and Vanden Abeele, Vero", title="Acceptance of a Smartphone-Based Visual Field Screening Platform for Glaucoma: Pre-Post Study", journal="JMIR Form Res", year="2021", month="Sep", day="17", volume="5", number="9", pages="e26602", keywords="mHealth acceptance", keywords="UTAUT", keywords="glaucoma screening", keywords="mhealth for eye care", keywords="mhealth", keywords="glaucoma", keywords="visual", keywords="eye", keywords="ophthalmology", keywords="ophthalmic", keywords="mobile phone", abstract="Background: Glaucoma, the silent thief of sight, is a major cause of blindness worldwide. It is a burden for people in low-income countries, specifically countries where glaucoma-induced blindness accounts for 15\% of the total incidence of blindness. More than half the people living with glaucoma in low-income countries are unaware of the disease until it progresses to an advanced stage, resulting in permanent visual impairment. Objective: This study aims to evaluate the acceptability of the Glaucoma Easy Screener (GES), a low-cost and portable visual field screening platform comprising a smartphone, a stereoscopic virtual reality headset, and a gaming joystick. Methods: A mixed methods study that included 24 eye care professionals from 4 hospitals in Southwest Ethiopia was conducted to evaluate the acceptability of GES. A pre-post design was used to collect perspectives before and after using the GES by using questionnaires and semistructured interviews. A Wilcoxon signed-rank test was used to determine the significance of any change in the scores of the questionnaire items (two-tailed, 95\% CI; $\alpha$=.05). The questionnaire and interview questions were guided by the Unified Theory of Acceptance and Use of Technology. Results: Positive results were obtained both before and after use, suggesting the acceptance of mobile health solutions for conducting glaucoma screening by using a low-cost headset with a smartphone and a game controller. There was a significant increase (two-tailed, 95\% CI; $\alpha$=.05) in the average scores of 86\% (19/22) of postuse questionnaire items compared with those of preuse questionnaire items. Ophthalmic professionals perceived GES as easy to use and as a tool that enabled the conduct of glaucoma screening tests, especially during outreach to rural areas. However, positive evaluations are contingent on the accuracy of the tool. Moreover, ophthalmologists voiced the need to limit the tool to screening only (ie, not for making diagnoses). Conclusions: This study supports the feasibility of using a mobile device in combination with a low-cost virtual reality headset and classic controller for glaucoma screening in rural areas. GES has the potential to reduce the burden of irreversible blindness caused by glaucoma. However, further assessment of its sensitivity and specificity is required. ", doi="10.2196/26602", url="https://formative.jmir.org/2021/9/e26602", url="http://www.ncbi.nlm.nih.gov/pubmed/34533462" } @Article{info:doi/10.2196/24352, author="Flanagan, Olivia and Chan, Amy and Roop, Partha and Sundram, Frederick", title="Using Acoustic Speech Patterns From Smartphones to Investigate Mood Disorders: Scoping Review", journal="JMIR Mhealth Uhealth", year="2021", month="Sep", day="17", volume="9", number="9", pages="e24352", keywords="smartphone", keywords="data science", keywords="speech patterns", keywords="mood disorders", keywords="diagnosis", keywords="monitoring", abstract="Background: Mood disorders are commonly underrecognized and undertreated, as diagnosis is reliant on self-reporting and clinical assessments that are often not timely. Speech characteristics of those with mood disorders differs from healthy individuals. With the wide use of smartphones, and the emergence of machine learning approaches, smartphones can be used to monitor speech patterns to help the diagnosis and monitoring of mood disorders. Objective: The aim of this review is to synthesize research on using speech patterns from smartphones to diagnose and monitor mood disorders. Methods: Literature searches of major databases, Medline, PsycInfo, EMBASE, and CINAHL, initially identified 832 relevant articles using the search terms ``mood disorders'', ``smartphone'', ``voice analysis'', and their variants. Only 13 studies met inclusion criteria: use of a smartphone for capturing voice data, focus on diagnosing or monitoring a mood disorder(s), clinical populations recruited prospectively, and in the English language only. Articles were assessed by 2 reviewers, and data extracted included data type, classifiers used, methods of capture, and study results. Studies were analyzed using a narrative synthesis approach. Results: Studies showed that voice data alone had reasonable accuracy in predicting mood states and mood fluctuations based on objectively monitored speech patterns. While a fusion of different sensor modalities revealed the highest accuracy (97.4\%), nearly 80\% of included studies were pilot trials or feasibility studies without control groups and had small sample sizes ranging from 1 to 73 participants. Studies were also carried out over short or varying timeframes and had significant heterogeneity of methods in terms of the types of audio data captured, environmental contexts, classifiers, and measures to control for privacy and ambient noise. Conclusions: Approaches that allow smartphone-based monitoring of speech patterns in mood disorders are rapidly growing. The current body of evidence supports the value of speech patterns to monitor, classify, and predict mood states in real time. However, many challenges remain around the robustness, cost-effectiveness, and acceptability of such an approach and further work is required to build on current research and reduce heterogeneity of methodologies as well as clinical evaluation of the benefits and risks of such approaches. ", doi="10.2196/24352", url="https://mhealth.jmir.org/2021/9/e24352", url="http://www.ncbi.nlm.nih.gov/pubmed/34533465" } @Article{info:doi/10.2196/27547, author="Morgado Areia, Carlos and Santos, Mauro and Vollam, Sarah and Pimentel, Marco and Young, Louise and Roman, Cristian and Ede, Jody and Piper, Philippa and King, Elizabeth and Gustafson, Owen and Harford, Mirae and Shah, Akshay and Tarassenko, Lionel and Watkinson, Peter", title="A Chest Patch for Continuous Vital Sign Monitoring: Clinical Validation Study During Movement and Controlled Hypoxia", journal="J Med Internet Res", year="2021", month="Sep", day="15", volume="23", number="9", pages="e27547", keywords="clinical validation", keywords="chest patch", keywords="vital signs", keywords="remote monitoring", keywords="wearable", keywords="heart rate", keywords="respiratory rate", abstract="Background: The standard of care in general wards includes periodic manual measurements, with the data entered into track-and-trigger charts, either on paper or electronically. Wearable devices may support health care staff, improve patient safety, and promote early deterioration detection in the interval between periodic measurements. However, regulatory standards for ambulatory cardiac monitors estimating heart rate (HR) and respiratory rate (RR) do not specify performance criteria during patient movement or clinical conditions in which the patient's oxygen saturation varies. Therefore, further validation is required before clinical implementation and deployment of any wearable system that provides continuous vital sign measurements. Objective: The objective of this study is to determine the agreement between a chest-worn patch (VitalPatch) and a gold standard reference device for HR and RR measurements during movement and gradual desaturation (modeling a hypoxic episode) in a controlled environment. Methods: After the VitalPatch and gold standard devices (Philips MX450) were applied, participants performed different movements in seven consecutive stages: at rest, sit-to-stand, tapping, rubbing, drinking, turning pages, and using a tablet. Hypoxia was then induced, and the participants' oxygen saturation gradually reduced to 80\% in a controlled environment. The primary outcome measure was accuracy, defined as the mean absolute error (MAE) of the VitalPatch estimates when compared with HR and RR gold standards (3-lead electrocardiography and capnography, respectively). We defined these as clinically acceptable if the rates were within 5 beats per minute for HR and 3 respirations per minute (rpm) for RR. Results: Complete data sets were acquired for 29 participants. In the movement phase, the HR estimates were within prespecified limits for all movements. For RR, estimates were also within the acceptable range, with the exception of the sit-to-stand and turning page movements, showing an MAE of 3.05 (95\% CI 2.48-3.58) rpm and 3.45 (95\% CI 2.71-4.11) rpm, respectively. For the hypoxia phase, both HR and RR estimates were within limits, with an overall MAE of 0.72 (95\% CI 0.66-0.78) beats per minute and 1.89 (95\% CI 1.75-2.03) rpm, respectively. There were no significant differences in the accuracy of HR and RR estimations between normoxia (?90\%), mild (89.9\%-85\%), and severe hypoxia (<85\%). Conclusions: The VitalPatch was highly accurate throughout both the movement and hypoxia phases of the study, except for RR estimation during the two types of movements. This study demonstrated that VitalPatch can be safely tested in clinical environments to support earlier detection of cardiorespiratory deterioration. Trial Registration: ISRCTN Registry ISRCTN61535692; https://www.isrctn.com/ISRCTN61535692 ", doi="10.2196/27547", url="https://www.jmir.org/2021/9/e27547", url="http://www.ncbi.nlm.nih.gov/pubmed/34524087" } @Article{info:doi/10.2196/26608, author="Sahandi Far, Mehran and Eickhoff, B. Simon and Goni, Maria and Dukart, Juergen", title="Exploring Test-Retest Reliability and Longitudinal Stability of Digital Biomarkers for Parkinson Disease in the m-Power Data Set: Cohort Study", journal="J Med Internet Res", year="2021", month="Sep", day="13", volume="23", number="9", pages="e26608", keywords="health sciences", keywords="medical research", keywords="biomarkers", keywords="diagnostic markers", keywords="neurological disorders", keywords="Parkinson disease", keywords="mobile phone", abstract="Background: Digital biomarkers (DB), as captured using sensors embedded in modern smart devices, are a promising technology for home-based sign and symptom monitoring in Parkinson disease (PD). Objective: Despite extensive application in recent studies, test-retest reliability and longitudinal stability of DB have not been well addressed in this context. We utilized the large-scale m-Power data set to establish the test-retest reliability and longitudinal stability of gait, balance, voice, and tapping tasks in an unsupervised and self-administered daily life setting in patients with PD and healthy controls (HC). Methods: Intraclass correlation coefficients were computed to estimate the test-retest reliability of features that also differentiate between patients with PD and healthy volunteers. In addition, we tested for longitudinal stability of DB measures in PD and HC, as well as for their sensitivity to PD medication effects. Results: Among the features differing between PD and HC, only a few tapping and voice features had good to excellent test-retest reliabilities and medium to large effect sizes. All other features performed poorly in this respect. Only a few features were sensitive to medication effects. The longitudinal analyses revealed significant alterations over time across a variety of features and in particular for the tapping task. Conclusions: These results indicate the need for further development of more standardized, sensitive, and reliable DB for application in self-administered remote studies in patients with PD. Motivational, learning, and other confounders may cause variations in performance that need to be considered in DB longitudinal applications. ", doi="10.2196/26608", url="https://www.jmir.org/2021/9/e26608", url="http://www.ncbi.nlm.nih.gov/pubmed/34515645" } @Article{info:doi/10.2196/28378, author="Chen, Chih-Hao and Lin, Haley Heng-Yu and Wang, Mao-Che and Chu, Yuan-Chia and Chang, Chun-Yu and Huang, Chii-Yuan and Cheng, Yen-Fu", title="Diagnostic Accuracy of Smartphone-Based Audiometry for Hearing Loss Detection: Meta-analysis", journal="JMIR Mhealth Uhealth", year="2021", month="Sep", day="10", volume="9", number="9", pages="e28378", keywords="audiometry", keywords="hearing loss", keywords="hearing test", keywords="mhealth", keywords="mobile health", keywords="digital health", keywords="meta-analysis", keywords="mobile phone", keywords="smartphone diagnostic test accuracy", abstract="Background: Hearing loss is one of the most common disabilities worldwide and affects both individual and public health. Pure tone audiometry (PTA) is the gold standard for hearing assessment, but it is often not available in many settings, given its high cost and demand for human resources. Smartphone-based audiometry may be equally effective and can improve access to adequate hearing evaluations. Objective: The aim of this systematic review is to synthesize the current evidence of the role of smartphone-based audiometry in hearing assessments and further explore the factors that influence its diagnostic accuracy. Methods: Five databases---PubMed, Embase, Cochrane Library, Web of Science, and Scopus---were queried to identify original studies that examined the diagnostic accuracy of hearing loss measurement using smartphone-based devices with conventional PTA as a reference test. A bivariate random-effects meta-analysis was performed to estimate the pooled sensitivity and specificity. The factors associated with diagnostic accuracy were identified using a bivariate meta-regression model. Study quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool. Results: In all, 25 studies with a total of 4470 patients were included in the meta-analysis. The overall sensitivity, specificity, and area under the receiver operating characteristic curve for smartphone-based audiometry were 89\% (95\% CI 83\%-93\%), 93\% (95\% CI 87\%-97\%), and 0.96 (95\% CI 0.93-0.97), respectively; the corresponding values for the smartphone-based speech recognition test were 91\% (95\% CI 86\%-94\%), 88\% (95\% CI 75\%-94\%), and 0.93 (95\% CI 0.90-0.95), respectively. Meta-regression analysis revealed that patient age, equipment used, and the presence of soundproof booths were significantly related to diagnostic accuracy. Conclusions: We have presented comprehensive evidence regarding the effectiveness of smartphone-based tests in diagnosing hearing loss. Smartphone-based audiometry may serve as an accurate and accessible approach to hearing evaluations, especially in settings where conventional PTA is unavailable. ", doi="10.2196/28378", url="https://mhealth.jmir.org/2021/9/e28378/", url="http://www.ncbi.nlm.nih.gov/pubmed/34515644" } @Article{info:doi/10.2196/31129, author="Han, Changho and Song, Youngjae and Lim, Hong-Seok and Tae, Yunwon and Jang, Jong-Hwan and Lee, Tak Byeong and Lee, Yeha and Bae, Woong and Yoon, Dukyong", title="Automated Detection of Acute Myocardial Infarction Using Asynchronous Electrocardiogram Signals---Preview of Implementing Artificial Intelligence With Multichannel Electrocardiographs Obtained From Smartwatches: Retrospective Study", journal="J Med Internet Res", year="2021", month="Sep", day="10", volume="23", number="9", pages="e31129", keywords="wearables", keywords="smartwatches", keywords="asynchronous electrocardiogram", keywords="artificial intelligence", keywords="deep learning", keywords="automatic diagnosis", keywords="myocardial infarction", keywords="timely diagnosis", keywords="machine learning", keywords="digital health", keywords="cardiac health", keywords="cardiology", abstract="Background: When using a smartwatch to obtain electrocardiogram (ECG) signals from multiple leads, the device has to be placed on different parts of the body sequentially. The ECG signals measured from different leads are asynchronous. Artificial intelligence (AI) models for asynchronous ECG signals have barely been explored. Objective: We aimed to develop an AI model for detecting acute myocardial infarction using asynchronous ECGs and compare its performance with that of the automatic ECG interpretations provided by a commercial ECG analysis software. We sought to evaluate the feasibility of implementing multiple lead--based AI-enabled ECG algorithms on smartwatches. Moreover, we aimed to determine the optimal number of leads for sufficient diagnostic power. Methods: We extracted ECGs recorded within 24 hours from each visit to the emergency room of Ajou University Medical Center between June 1994 and January 2018 from patients aged 20 years or older. The ECGs were labeled on the basis of whether a diagnostic code corresponding to acute myocardial infarction was entered. We derived asynchronous ECG lead sets from standard 12-lead ECG reports and simulated a situation similar to the sequential recording of ECG leads via smartwatches. We constructed an AI model based on residual networks and self-attention mechanisms by randomly masking each lead channel during the training phase and then testing the model using various targeting lead sets with the remaining lead channels masked. Results: The performance of lead sets with 3 or more leads compared favorably with that of the automatic ECG interpretations provided by a commercial ECG analysis software, with 8.1\%-13.9\% gain in sensitivity when the specificity was matched. Our results indicate that multiple lead-based AI-enabled ECG algorithms can be implemented on smartwatches. Model performance generally increased as the number of leads increased (12-lead sets: area under the receiver operating characteristic curve [AUROC] 0.880; 4-lead sets: AUROC 0.858, SD 0.008; 3-lead sets: AUROC 0.845, SD 0.011; 2-lead sets: AUROC 0.813, SD 0.018; single-lead sets: AUROC 0.768, SD 0.001). Considering the short amount of time needed to measure additional leads, measuring at least 3 leads---ideally more than 4 leads---is necessary for minimizing the risk of failing to detect acute myocardial infarction occurring in a certain spatial location or direction. Conclusions: By developing an AI model for detecting acute myocardial infarction with asynchronous ECG lead sets, we demonstrated the feasibility of multiple lead-based AI-enabled ECG algorithms on smartwatches for automated diagnosis of cardiac disorders. We also demonstrated the necessity of measuring at least 3 leads for accurate detection. Our results can be used as reference for the development of other AI models using sequentially measured asynchronous ECG leads via smartwatches for detecting various cardiac disorders. ", doi="10.2196/31129", url="https://www.jmir.org/2021/9/e31129", url="http://www.ncbi.nlm.nih.gov/pubmed/34505839" } @Article{info:doi/10.2196/28192, author="Komatsu, Teppei and Sakai, Kenichiro and Iguchi, Yasuyuki and Takao, Hiroyuki and Ishibashi, Toshihiro and Murayama, Yuichi", title="Using a Smartphone Application for the Accurate and Rapid Diagnosis of Acute Anterior Intracranial Arterial Occlusion: Usability Study", journal="J Med Internet Res", year="2021", month="Aug", day="27", volume="23", number="8", pages="e28192", keywords="stroke", keywords="infarction", keywords="teleradiology", keywords="smartphone", keywords="telehealth", keywords="reperfusion", keywords="neurology", keywords="mHealth", keywords="application", keywords="mobile health", keywords="mobile applications", keywords="diagnosis", keywords="diagnostics", abstract="Background: Telestroke has developed rapidly as an assessment tool for patients eligible for reperfusion therapy. Objective: To investigate whether vascular neurologists can diagnose intracranial large vessel occlusion (LVO) as quickly and accurately using a smartphone application compared to a hospital-based desktop PC monitor. Methods: We retrospectively enrolled 108 consecutive patients with acute ischemic stroke in the middle cerebral artery territory who underwent magnetic resonance imaging (MRI) within 24 hours of their stroke onset. Two vascular neurologists, blinded to all clinical information, independently evaluated magnetic resonance angiography and fluid-attenuated inversion recovery images for the presence or absence of LVO in the internal carotid artery and middle cerebral artery (M1, M2, or M3) on both a smartphone application (Smartphone-LVO) and a hospital-based desktop PC monitor (PC-LVO). To evaluate the accuracy of an arterial occlusion diagnosis, interdevice variability between Smartphone-LVO and PC-LVO was analyzed using $\kappa$ statistics, and image interpretation time was compared between Smartphone-LVO and PC-LVO. Results: There was broad agreement between Smartphone-LVO and PC-LVO evaluations regarding the presence or absence of arterial occlusion (Reader 1: $\kappa$=0.94; P<.001 vs Reader 2: $\kappa$=0.89; P<.001), and interpretation times were similar between Smartphone-LVO and PC-LVO. Conclusions: The results indicate the evaluation of neuroimages using a smartphone application can provide an accurate and timely diagnosis of anterior intracranial arterial occlusion that can be shared immediately with members of the stroke team to support the management of patients with hyperacute ischemic stroke. ", doi="10.2196/28192", url="https://www.jmir.org/2021/8/e28192", url="http://www.ncbi.nlm.nih.gov/pubmed/34448716" } @Article{info:doi/10.2196/25907, author="Brasier, Noe and Osthoff, Michael and De Ieso, Fiorangelo and Eckstein, Jens", title="Next-Generation Digital Biomarkers for Tuberculosis and Antibiotic Stewardship: Perspective on Novel Molecular Digital Biomarkers in Sweat, Saliva, and Exhaled Breath", journal="J Med Internet Res", year="2021", month="Aug", day="19", volume="23", number="8", pages="e25907", keywords="digital biomarkers", keywords="active tuberculosis", keywords="drug resistance", keywords="wearable", keywords="smart biosensors", keywords="iSudorology", keywords="infectious diseases", doi="10.2196/25907", url="https://www.jmir.org/2021/8/e25907", url="http://www.ncbi.nlm.nih.gov/pubmed/34420925" } @Article{info:doi/10.2196/28266, author="Kummer, Benjamin and Shakir, Lubaina and Kwon, Rachel and Habboushe, Joseph and Jett{\'e}, Nathalie", title="Usage Patterns of Web-Based Stroke Calculators in Clinical Decision Support: Retrospective Analysis", journal="JMIR Med Inform", year="2021", month="Aug", day="2", volume="9", number="8", pages="e28266", keywords="medical informatics", keywords="clinical informatics", keywords="mhealth", keywords="digital health", keywords="cerebrovascular disease", keywords="medical calculators", keywords="health information", keywords="health information technology", keywords="information technology", keywords="economic health", keywords="clinical health", keywords="electronic health records", abstract="Background: Clinical scores are frequently used in the diagnosis and management of stroke. While medical calculators are increasingly important support tools for clinical decisions, the uptake and use of common medical calculators for stroke remain poorly characterized. Objective: We aimed to describe use patterns in frequently used stroke-related medical calculators for clinical decisions from a web-based support system. Methods: We conducted a retrospective study of calculators from MDCalc, a web-based and mobile app--based medical calculator platform based in the United States. We analyzed metadata tags from MDCalc's calculator use data to identify all calculators related to stroke. Using relative page views as a measure of calculator use, we determined the 5 most frequently used stroke-related calculators between January 2016 and December 2018. For all 5 calculators, we determined cumulative and quarterly use, mode of access (eg, app or web browser), and both US and international distributions of use. We compared cumulative use in the 2016-2018 period with use from January 2011 to December 2015. Results: Over the study period, we identified 454 MDCalc calculators, of which 48 (10.6\%) were related to stroke. Of these, the 5 most frequently used calculators were the CHA2DS2-VASc score for atrial fibrillation stroke risk calculator (5.5\% of total and 32\% of stroke-related page views), the Mean Arterial Pressure calculator (2.4\% of total and 14.0\% of stroke-related page views), the HAS-BLED score for major bleeding risk (1.9\% of total and 11.4\% of stroke-related page views), the National Institutes of Health Stroke Scale (NIHSS) score calculator (1.7\% of total and 10.1\% of stroke-related page views), and the CHADS2 score for atrial fibrillation stroke risk calculator (1.4\% of total and 8.1\% of stroke-related page views). Web browser was the most common mode of access, accounting for 82.7\%-91.2\% of individual stroke calculator page views. Access originated most frequently from the most populated regions within the United States. Internationally, use originated mostly from English-language countries. The NIHSS score calculator demonstrated the greatest increase in page views (238.1\% increase) between the first and last quarters of the study period. Conclusions: The most frequently used stroke calculators were the CHA2DS2-VASc, Mean Arterial Pressure, HAS-BLED, NIHSS, and CHADS2. These were mainly accessed by web browser, from English-speaking countries, and from highly populated areas. Further studies should investigate barriers to stroke calculator adoption and the effect of calculator use on the application of best practices in cerebrovascular disease. ", doi="10.2196/28266", url="https://medinform.jmir.org/2021/8/e28266", url="http://www.ncbi.nlm.nih.gov/pubmed/34338647" } @Article{info:doi/10.2196/23109, author="Hirosawa, Takanobu and Harada, Yukinori and Ikenoya, Kohei and Kakimoto, Shintaro and Aizawa, Yuki and Shimizu, Taro", title="The Utility of Real-Time Remote Auscultation Using a Bluetooth-Connected Electronic Stethoscope: Open-Label Randomized Controlled Pilot Trial", journal="JMIR Mhealth Uhealth", year="2021", month="Jul", day="27", volume="9", number="7", pages="e23109", keywords="telemedicine", keywords="electronic stethoscope", keywords="simulator", keywords="remote auscultation", keywords="lung auscultation", keywords="cardiac auscultation", keywords="physical examination", abstract="Background: The urgent need for telemedicine has become clear in the COVID-19 pandemic. To facilitate telemedicine, the development and improvement of remote examination systems are required. A system combining an electronic stethoscope and Bluetooth connectivity is a promising option for remote auscultation in clinics and hospitals. However, the utility of such systems remains unknown. Objective: This study was conducted to assess the utility of real-time auscultation using a Bluetooth-connected electronic stethoscope compared to that of classical auscultation, using lung and cardiology patient simulators. Methods: This was an open-label, randomized controlled trial including senior residents and faculty in the department of general internal medicine of a university hospital. The only exclusion criterion was a refusal to participate. This study consisted of 2 parts: lung auscultation and cardiac auscultation. Each part contained a tutorial session and a test session. All participants attended a tutorial session, in which they listened to 15 sounds on the simulator using a classic stethoscope and were told the correct classification. Thereafter, participants were randomly assigned to either the real-time remote auscultation group (intervention group) or the classical auscultation group (control group) for test sessions. In the test sessions, participants had to classify a series of 10 lung sounds and 10 cardiac sounds, depending on the study part. The intervention group listened to the sounds remotely using the electronic stethoscope, a Bluetooth transmitter, and a wireless, noise-canceling, stereo headset. The control group listened to the sounds directly using a traditional stethoscope. The primary outcome was the test score, and the secondary outcomes were the rates of correct answers for each sound. Results: In total, 20 participants were included. There were no differences in age, sex, and years from graduation between the 2 groups in each part. The overall test score of lung auscultation in the intervention group (80/110, 72.7\%) was not different from that in the control group (71/90, 78.9\%; P=.32). The only lung sound for which the correct answer rate differed between groups was that of pleural friction rubs (P=.03); it was lower in the intervention group (3/11, 27\%) than in the control group (7/9, 78\%). The overall test score for cardiac auscultation in the intervention group (50/60, 83.3\%) was not different from that in the control group (119/140, 85.0\%; P=.77). There was no cardiac sound for which the correct answer rate differed between groups. Conclusions: The utility of a real-time remote auscultation system using a Bluetooth-connected electronic stethoscope was comparable to that of direct auscultation using a classic stethoscope, except for classification of pleural friction rubs. This means that most of the real world's essential cardiopulmonary sounds could be classified by a real-time remote auscultation system using a Bluetooth-connected electronic stethoscope. Trial Registration: UMIN-CTR UMIN000040828; https://tinyurl.com/r24j2p6s and UMIN-CTR UMIN000041601; https://tinyurl.com/bsax3j5f ", doi="10.2196/23109", url="https://mhealth.jmir.org/2021/7/e23109", url="http://www.ncbi.nlm.nih.gov/pubmed/34313598" } @Article{info:doi/10.2196/29336, author="{\'C}irkovi{\'c}, Aleksandar", title="Author's Reply to: Periodic Manual Algorithm Updates and Generalizability: A Developer's Response. Comment on ``Evaluation of Four Artificial Intelligence--Assisted Self-Diagnosis Apps on Three Diagnoses: Two-Year Follow-Up Study''", journal="J Med Internet Res", year="2021", month="Jun", day="16", volume="23", number="6", pages="e29336", keywords="artificial intelligence", keywords="machine learning", keywords="mobile apps", keywords="medical diagnosis", keywords="mHealth", keywords="symptom assessment", doi="10.2196/29336", url="https://www.jmir.org/2021/6/e29336", url="http://www.ncbi.nlm.nih.gov/pubmed/34132643" } @Article{info:doi/10.2196/26514, author="Gilbert, Stephen and Fenech, Matthew and Idris, Anisa and T{\"u}rk, Ewelina", title="Periodic Manual Algorithm Updates and Generalizability: A Developer's Response. Comment on ``Evaluation of Four Artificial Intelligence--Assisted Self-Diagnosis Apps on Three Diagnoses: Two-Year Follow-Up Study''", journal="J Med Internet Res", year="2021", month="Jun", day="16", volume="23", number="6", pages="e26514", keywords="artificial intelligence", keywords="machine learning", keywords="mobile apps", keywords="medical diagnosis", keywords="mHealth", keywords="symptom assessment", doi="10.2196/26514", url="https://www.jmir.org/2021/6/e26514", url="http://www.ncbi.nlm.nih.gov/pubmed/34132641" } @Article{info:doi/10.2196/26167, author="Yang, Yun Tien and Huang, Li and Malwade, Shwetambara and Hsu, Chien-Yi and Chen, Ching Yang", title="Diagnostic Accuracy of Ambulatory Devices in Detecting Atrial Fibrillation: Systematic Review and Meta-analysis", journal="JMIR Mhealth Uhealth", year="2021", month="Apr", day="9", volume="9", number="4", pages="e26167", keywords="atrial fibrillation", keywords="ambulatory devices", keywords="electrocardiogram", keywords="photoplethysmography", keywords="diagnostic accuracy", keywords="ubiquitous health", keywords="mobile health", keywords="technology", keywords="ambulatory device", abstract="Background: Atrial fibrillation (AF) is the most common cardiac arrhythmia worldwide. Early diagnosis of AF is crucial for preventing AF-related morbidity, mortality, and economic burden, yet the detection of the disease remains challenging. The 12-lead electrocardiogram (ECG) is the gold standard for the diagnosis of AF. Because of technological advances, ambulatory devices may serve as convenient screening tools for AF. Objective: The objective of this review was to investigate the diagnostic accuracy of 2 relatively new technologies used in ambulatory devices, non-12-lead ECG and photoplethysmography (PPG), in detecting AF. We performed a meta-analysis to evaluate the diagnostic accuracy of non-12-lead ECG and PPG compared to the reference standard, 12-lead ECG. We also conducted a subgroup analysis to assess the impact of study design and participant recruitment on diagnostic accuracy. Methods: This systematic review and meta-analysis was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines. MEDLINE and EMBASE were systematically searched for articles published from January 1, 2015 to January 23, 2021. A bivariate model was used to pool estimates of sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), and area under the summary receiver operating curve (SROC) as the main diagnostic measures. Study quality was evaluated using the quality assessment of diagnostic accuracy studies (QUADAS-2) tool. Results: Our search resulted in 16 studies using either non-12-lead ECG or PPG for detecting AF, comprising 3217 participants and 7623 assessments. The pooled estimates of sensitivity, specificity, PLR, NLR, and diagnostic odds ratio for the detection of AF were 89.7\% (95\% CI 83.2\%-93.9\%), 95.7\% (95\% CI 92.0\%-97.7\%), 20.64 (95\% CI 10.10-42.15), 0.11 (95\% CI 0.06-0.19), and 224.75 (95\% CI 70.10-720.56), respectively, for the automatic interpretation of non-12-lead ECG measurements and 94.7\% (95\% CI 93.3\%-95.8\%), 97.6\% (95\% CI 94.5\%-99.0\%), 35.51 (95\% CI 18.19-69.31), 0.05 (95\% CI 0.04-0.07), and 730.79 (95\% CI 309.33-1726.49), respectively, for the automatic interpretation of PPG measurements. Conclusions: Both non-12-lead ECG and PPG offered high diagnostic accuracies for AF. Detection employing automatic analysis techniques may serve as a useful preliminary screening tool before administering a gold standard test, which generally requires competent physician analyses. Subgroup analysis indicated variations of sensitivity and specificity between studies that recruited low-risk and high-risk populations, warranting future validity tests in the general population. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews CRD42020179937; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=179937 ", doi="10.2196/26167", url="https://mhealth.jmir.org/2021/4/e26167", url="http://www.ncbi.nlm.nih.gov/pubmed/33835039" } @Article{info:doi/10.2196/22637, author="Aboueid, Stephanie and Meyer, Samantha and Wallace, R. James and Mahajan, Shreya and Chaurasia, Ashok", title="Young Adults' Perspectives on the Use of Symptom Checkers for Self-Triage and Self-Diagnosis: Qualitative Study", journal="JMIR Public Health Surveill", year="2021", month="Jan", day="6", volume="7", number="1", pages="e22637", keywords="self-assessment", keywords="symptom checkers", keywords="self-triage", keywords="self-diagnosis", keywords="young adults", keywords="digital platforms", keywords="internet", keywords="user experience", keywords="Google search", abstract="Background: Young adults often browse the internet for self-triage and diagnosis. More sophisticated digital platforms such as symptom checkers have recently become pervasive; however, little is known about their use. Objective: The aim of this study was to understand young adults' (18-34 years old) perspectives on the use of the Google search engine versus a symptom checker, as well as to identify the barriers and enablers for using a symptom checker for self-triage and self-diagnosis. Methods: A qualitative descriptive case study research design was used. Semistructured interviews were conducted with 24 young adults enrolled in a university in Ontario, Canada. All participants were given a clinical vignette and were asked to use a symptom checker (WebMD Symptom Checker or Babylon Health) while thinking out loud, and were asked questions regarding their experience. Interviews were audio-recorded, transcribed, and imported into the NVivo software program. Inductive thematic analysis was conducted independently by two researchers. Results: Using the Google search engine was perceived to be faster and more customizable (ie, ability to enter symptoms freely in the search engine) than a symptom checker; however, a symptom checker was perceived to be useful for a more personalized assessment. After having used a symptom checker, most of the participants believed that the platform needed improvement in the areas of accuracy, security and privacy, and medical jargon used. Given these limitations, most participants believed that symptom checkers could be more useful for self-triage than for self-diagnosis. Interestingly, more than half of the participants were not aware of symptom checkers prior to this study and most believed that this lack of awareness about the existence of symptom checkers hindered their use. Conclusions: Awareness related to the existence of symptom checkers and their integration into the health care system are required to maximize benefits related to these platforms. Addressing the barriers identified in this study is likely to increase the acceptance and use of symptom checkers by young adults. ", doi="10.2196/22637", url="https://publichealth.jmir.org/2021/1/e22637", url="http://www.ncbi.nlm.nih.gov/pubmed/33404515" } @Article{info:doi/10.2196/18097, author="{\'C}irkovi{\'c}, Aleksandar", title="Evaluation of Four Artificial Intelligence--Assisted Self-Diagnosis Apps on Three Diagnoses: Two-Year Follow-Up Study", journal="J Med Internet Res", year="2020", month="Dec", day="4", volume="22", number="12", pages="e18097", keywords="artificial intelligence", keywords="machine learning", keywords="mobile apps", keywords="medical diagnosis", keywords="mHealth", abstract="Background: Consumer-oriented mobile self-diagnosis apps have been developed using undisclosed algorithms, presumably based on machine learning and other artificial intelligence (AI) technologies. The US Food and Drug Administration now discerns apps with learning AI algorithms from those with stable ones and treats the former as medical devices. To the author's knowledge, no self-diagnosis app testing has been performed in the field of ophthalmology so far. Objective: The objective of this study was to test apps that were previously mentioned in the scientific literature on a set of diagnoses in a deliberate time interval, comparing the results and looking for differences that hint at ``nonlocked'' learning algorithms. Methods: Four apps from the literature were chosen (Ada, Babylon, Buoy, and Your.MD). A set of three ophthalmology diagnoses (glaucoma, retinal tear, dry eye syndrome) representing three levels of urgency was used to simultaneously test the apps' diagnostic efficiency and treatment recommendations in this specialty. Two years was the chosen time interval between the tests (2018 and 2020). Scores were awarded by one evaluating physician using a defined scheme. Results: Two apps (Ada and Your.MD) received significantly higher scores than the other two. All apps either worsened in their results between 2018 and 2020 or remained unchanged at a low level. The variation in the results over time indicates ``nonlocked'' learning algorithms using AI technologies. None of the apps provided correct diagnoses and treatment recommendations for all three diagnoses in 2020. Two apps (Babylon and Your.MD) asked significantly fewer questions than the other two (P<.001). Conclusions: ``Nonlocked'' algorithms are used by self-diagnosis apps. The diagnostic efficiency of the tested apps seems to worsen over time, with some apps being more capable than others. Systematic studies on a wider scale are necessary for health care providers and patients to correctly assess the safety and efficacy of such apps and for correct classification by health care regulating authorities. ", doi="10.2196/18097", url="https://www.jmir.org/2020/12/e18097", url="http://www.ncbi.nlm.nih.gov/pubmed/33275113" } @Article{info:doi/10.2196/20031, author="Tsai, FS Vincent and Zhuang, Bin and Pong, Yuan-Hung and Hsieh, Ju-Ton and Chang, Hong-Chiang", title="Web- and Artificial Intelligence--Based Image Recognition For Sperm Motility Analysis: Verification Study", journal="JMIR Med Inform", year="2020", month="Nov", day="19", volume="8", number="11", pages="e20031", keywords="Male infertility", keywords="semen analysis", keywords="home sperm test", keywords="smartphone", keywords="artificial intelligence", keywords="cloud computing", keywords="telemedicine", abstract="Background: Human sperm quality fluctuates over time. Therefore, it is crucial for couples preparing for natural pregnancy to monitor sperm motility. Objective: This study verified the performance of an artificial intelligence--based image recognition and cloud computing sperm motility testing system (Bemaner, Createcare) composed of microscope and microfluidic modules and designed to adapt to different types of smartphones. Methods: Sperm videos were captured and uploaded to the cloud with an app. Analysis of sperm motility was performed by an artificial intelligence--based image recognition algorithm then results were displayed. According to the number of motile sperm in the vision field, 47 (deidentified) videos of sperm were scored using 6 grades (0-5) by a male-fertility expert with 10 years of experience. Pearson product-moment correlation was calculated between the grades and the results (concentration of total sperm, concentration of motile sperm, and motility percentage) computed by the system. Results: Good correlation was demonstrated between the grades and results computed by the system for concentration of total sperm (r=0.65, P<.001), concentration of motile sperm (r=0.84, P<.001), and motility percentage (r=0.90, P<.001). Conclusions: This smartphone-based sperm motility test (Bemaner) accurately measures motility-related parameters and could potentially be applied toward the following fields: male infertility detection, sperm quality test during preparation for pregnancy, and infertility treatment monitoring. With frequent at-home testing, more data can be collected to help make clinical decisions and to conduct epidemiological research. ", doi="10.2196/20031", url="http://medinform.jmir.org/2020/11/e20031/", url="http://www.ncbi.nlm.nih.gov/pubmed/33211025" } @Article{info:doi/10.2196/23047, author="Lin, Haley Heng-Yu and Chu, Yuan-Chia and Lai, Ying-Hui and Cheng, Hsiu-Lien and Lai, Feipei and Cheng, Yen-Fu and Liao, Wen-Huei", title="A Smartphone-Based Approach to Screening for Sudden Sensorineural Hearing Loss: Cross-Sectional Validity Study", journal="JMIR Mhealth Uhealth", year="2020", month="Nov", day="11", volume="8", number="11", pages="e23047", keywords="sudden sensorineural hearing loss", keywords="hearing test", keywords="telemedicine", keywords="mobile apps", keywords="pure tone", keywords="audiometry", abstract="Background: Sudden sensorineural hearing loss (SSNHL) is an otologic emergency that warrants urgent management. Pure-tone audiometry remains the gold standard for definitively diagnosing SSNHL. However, in clinical settings such as primary care practices and urgent care facilities, conventional pure-tone audiometry is often unavailable. Objective: This study aimed to determine the correlation between hearing outcomes measured by conventional pure-tone audiometry and those measured by the proposed smartphone-based Ear Scale app and determine the diagnostic validity of the hearing scale differences between the two ears as obtained by the Ear Scale app for SSNHL. Methods: This cross-sectional study included a cohort of 88 participants with possible SSNHL who were referred to an otolaryngology clinic or emergency department at a tertiary medical center in Taipei, Taiwan, between January 2018 and June 2019. All participants underwent hearing assessments with conventional pure-tone audiometry and the proposed smartphone-based Ear Scale app consecutively. The gold standard for diagnosing SSNHL was defined as the pure-tone average (PTA) difference between the two ears being ?30 dB HL. The hearing results measured by the Ear Scale app were presented as 20 stratified hearing scales. The hearing scale difference between the two ears was estimated to detect SSNHL. Results: The study sample comprised 88 adults with a mean age of 46 years, and 50\% (44/88) were females. PTA measured by conventional pure-tone audiometry was strongly correlated with the hearing scale assessed by the Ear Scale app, with a Pearson correlation coefficient of .88 (95\% CI .82-.92). The sensitivity of the 5--hearing scale difference (25 dB HL difference) between the impaired ear and the contralateral ear in diagnosing SSNHL was 95.5\% (95\% CI 87.5\%-99.1\%), with a specificity of 66.7\% (95\% CI 43.0\%-85.4\%). Conclusions: Our findings suggest that the proposed smartphone-based Ear Scale app can be useful in the evaluation of SSNHL in clinical settings where conventional pure-tone audiometry is not available. ", doi="10.2196/23047", url="http://mhealth.jmir.org/2020/11/e23047/", url="http://www.ncbi.nlm.nih.gov/pubmed/33174845" } @Article{info:doi/10.2196/24587, author="Porter, Paul and Claxton, Scott and Brisbane, Joanna and Bear, Natasha and Wood, Javan and Peltonen, Vesa and Della, Phillip and Purdie, Fiona and Smith, Claire and Abeyratne, Udantha", title="Diagnosing Chronic Obstructive Airway Disease on a Smartphone Using Patient-Reported Symptoms and Cough Analysis: Diagnostic Accuracy Study", journal="JMIR Form Res", year="2020", month="Nov", day="10", volume="4", number="11", pages="e24587", keywords="respiratory", keywords="medicine", keywords="diagnostic algorithm", keywords="telehealth", keywords="acute care", abstract="Background: Rapid and accurate diagnosis of chronic obstructive pulmonary disease (COPD) is problematic in acute care settings, particularly in the presence of infective comorbidities. Objective: The aim of this study was to develop a rapid smartphone-based algorithm for the detection of COPD in the presence or absence of acute respiratory infection and evaluate diagnostic accuracy on an independent validation set. Methods: Participants aged 40 to 75 years with or without symptoms of respiratory disease who had no chronic respiratory condition apart from COPD, chronic bronchitis, or emphysema were recruited into the study. The algorithm analyzed 5 cough sounds and 4 patient-reported clinical symptoms, providing a diagnosis in less than 1 minute. Clinical diagnoses were determined by a specialist physician using all available case notes, including spirometry where available. Results: The algorithm demonstrated high positive percent agreement (PPA) and negative percent agreement (NPA) with clinical diagnosis for COPD in the total cohort (N=252; PPA=93.8\%, NPA=77.0\%, area under the curve [AUC]=0.95), in participants with pneumonia or infective exacerbations of COPD (n=117; PPA=86.7\%, NPA=80.5\%, AUC=0.93), and in participants without an infective comorbidity (n=135; PPA=100.0\%, NPA=74.0\%, AUC=0.97). In those who had their COPD confirmed by spirometry (n=229), PPA was 100.0\% and NPA was 77.0\%, with an AUC of 0.97. Conclusions: The algorithm demonstrated high agreement with clinical diagnosis and rapidly detected COPD in participants presenting with or without other infective lung illnesses. The algorithm can be installed on a smartphone to provide bedside diagnosis of COPD in acute care settings, inform treatment regimens, and identify those at increased risk of mortality due to seasonal or other respiratory ailments. Trial Registration: Australian New Zealand Clinical Trials Registry ACTRN12618001521213; http://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=375939 ", doi="10.2196/24587", url="http://formative.jmir.org/2020/11/e24587/", url="http://www.ncbi.nlm.nih.gov/pubmed/33170129" }