Comparison of Diagnostic and Triage Accuracy of Ada Health and WebMD Symptom Checkers, ChatGPT, and Physicians for Patients in an Emergency Department: Clinical Data Analysis Study

doi:10.2196/49995

Journals

Infante A, Gaudino S, Orsini F, Del Ciello A, Gullì C, Merlino B, Natale L, Iezzi R, Sala E. Large language models (LLMs) in the evaluation of emergency radiology reports: performance of ChatGPT-4, Perplexity, and Bard. Clinical Radiology 2024;79(2):102 View
Ćirković A, Katz T. Exploring the Potential of ChatGPT-4 in Predicting Refractive Surgery Categorizations: Comparative Study. JMIR Formative Research 2023;7:e51798 View
Sallam M, Barakat M, Sallam M. A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence–Based Models in Health Care Education and Practice: Development Study Involving a Literature Review. Interactive Journal of Medical Research 2024;13:e54704 View
Reis F, Lenz C. Performance of Artificial Intelligence (AI)-Powered Chatbots in the Assessment of Medical Case Reports: Qualitative Insights From Simulated Scenarios. Cureus 2024 View
Wang L, Chen X, Deng X, Wen H, You M, Liu W, Li Q, Li J. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs. npj Digital Medicine 2024;7(1) View
Xue Z, Zhang Y, Gan W, Wang H, She G, Zheng X. Quality and Dependability of ChatGPT and DingXiangYuan Forums for Remote Orthopedic Consultations: Comparative Analysis. Journal of Medical Internet Research 2024;26:e50882 View
Jindal A, Brandao-de-Resende C, Neo Y, Melo M, Day A. Enhancing Ophthalmic Triage: identification of new clinical features to support healthcare professionals in triage. Eye 2024;38(13):2536 View
Sheikh M, Barreto E, Miao J, Thongprayoon C, Gregoire J, Dreesman B, Erickson S, Craici I, Cheungpasitporn W. Evaluating ChatGPT's efficacy in assessing the safety of non-prescription medications and supplements in patients with kidney disease. DIGITAL HEALTH 2024;10 View
Frosolini A, Catarzi L, Benedetti S, Latini L, Chisci G, Franz L, Gennaro P, Gabriele G. The Role of Large Language Models (LLMs) in Providing Triage for Maxillofacial Trauma Cases: A Preliminary Study. Diagnostics 2024;14(8):839 View
Andreadis K, Newman D, Twan C, Shunk A, Mann D, Stevens E. Mixed methods assessment of the influence of demographics on medical advice of ChatGPT. Journal of the American Medical Informatics Association 2024;31(9):2002 View
Harada Y, Sakamoto T, Sugimoto S, Shimizu T. Longitudinal Changes in Diagnostic Accuracy of a Differential Diagnosis List Developed by an AI-Based Symptom Checker: Retrospective Observational Study. JMIR Formative Research 2024;8:e53985 View
Pressman S, Borna S, Gomez-Cabello C, Haider S, Forte A. AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries. Journal of Clinical Medicine 2024;13(10):2832 View
Yazaki M, Maki S, Furuya T, Inoue K, Nagai K, Nagashima Y, Maruyama J, Toki Y, Kitagawa K, Iwata S, Kitamura T, Gushiken S, Noguchi Y, Inoue M, Shiga Y, Inage K, Orita S, Nakada T, Ohtori S. Emergency Patient Triage Improvement through a Retrieval-Augmented Generation Enhanced Large-Scale Language Model. Prehospital Emergency Care 2025;29(3):203 View
Hoppe J, Auer M, Strüven A, Massberg S, Stremmel C. ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis. Journal of Medical Internet Research 2024;26:e56110 View
Silverman A, Shung D, Stidham R, Kochhar G, Iacucci M. How Artificial Intelligence Will Transform Clinical Care, Research, and Trials for Inflammatory Bowel Disease. Clinical Gastroenterology and Hepatology 2025;23(3):428 View
Scott I, Miller T, Crock C. Using conversant artificial intelligence to improve diagnostic reasoning: ready for prime time?. Medical Journal of Australia 2024;221(5):240 View
Tong L, Zhang C, Liu R, Yang J, Sun Z. Comparative performance analysis of large language models: ChatGPT-3.5, ChatGPT-4 and Google Gemini in glucocorticoid-induced osteoporosis. Journal of Orthopaedic Surgery and Research 2024;19(1) View
Ghilzai U, Fiedler B, Ghali A, Singh A, Cass B, Young A, Ahmed A. ChatGPT provides acceptable responses to patient questions regarding common shoulder pathology. Shoulder & Elbow 2025;17(5):625 View
Colakca C, Ergın M, Ozensoy H, Sener A, Guru S, Ozhasenekler A. Emergency department triaging using ChatGPT based on emergency severity index principles: a cross-sectional study. Scientific Reports 2024;14(1) View
Bedi S, Liu Y, Orr-Ewing L, Dash D, Koyejo S, Callahan A, Fries J, Wornow M, Swaminathan A, Lehmann L, Hong H, Kashyap M, Chaurasia A, Shah N, Singh K, Tazbaz T, Milstein A, Pfeffer M, Shah N. Testing and Evaluation of Health Care Applications of Large Language Models. JAMA 2025;333(4):319 View
Wu A. Chatting together: Using AI chatbots to improve diagnostic excellence. Journal of Patient Safety and Risk Management 2024;29(5):222 View
Hayat J, Lari M, AlHerz M, Lari A. The Utility and Limitations of Artificial Intelligence-Powered Chatbots in Healthcare. Cureus 2024 View
Ho C, Tian T, Ayers A, Aaron R, Phillips V, Wolf R, Mathioudakis N, Dai T, Klonoff D. Qualitative metrics from the biomedical literature for evaluating large language models in clinical decision-making: a narrative review. BMC Medical Informatics and Decision Making 2024;24(1) View
Jin H, Kim E. Performance of GPT-3.5 and GPT-4 on the Korean Pharmacist Licensing Examination: Comparison Study. JMIR Medical Education 2024;10:e57451 View
Cano-Besquet S, Rice-Canetto T, Abou-El-Hassan H, Alarcon S, Zimmerman J, Issagholian L, Salomon N, Rojas I, Dhahbi J, Neeki M. ChatGPT4’s diagnostic accuracy in inpatient neurology: A retrospective cohort study. Heliyon 2024;10(24):e40964 View
Brochu B, Mirsky N, Thaller S. Evaluating ChatGPT’s efficacy in addressing common patient questions in plastic surgery consultations. Artificial Intelligence Surgery 2024;4(4):411 View
Arslan B, Nuhoglu C, Satici M, Altinbilek E. Evaluating LLM-based generative AI tools in emergency triage: A comparative study of ChatGPT Plus, Copilot Pro, and triage nurses. The American Journal of Emergency Medicine 2025;89:174 View
Naved B, Luo Y. Contrasting rule and machine learning based digital self triage systems in the USA. npj Digital Medicine 2024;7(1) View
Oztermeli A. Is ChatGPT a Reliable Tool for Explaining Medical Terms?. Cureus 2025 View
Vaira L, Lechien J, Abbate V, Gabriele G, Frosolini A, De Vito A, Maniaci A, Mayo‐Yáñez M, Boscolo‐Rizzo P, Saibene A, Maglitto F, Salzano G, Califano G, Troise S, Chiesa‐Estomba C, De Riu G. Enhancing AI Chatbot Responses in Health Care: The SMART Prompt Structure in Head and Neck Surgery. OTO Open 2025;9(1) View
Kareemi H, Yadav K, Price C, Bobrovitz N, Meehan A, Li H, Goel G, Masood S, Grant L, Ben‐Yakov M, Michalowski W, Vaillancourt C. Artificial intelligence–based clinical decision support in the emergency department: A scoping review. Academic Emergency Medicine 2025;32(4):386 View
Huo B, Boyle A, Marfo N, Tangamornsuksan W, Steen J, McKechnie T, Lee Y, Mayol J, Antoniou S, Thirunavukarasu A, Sanger S, Ramji K, Guyatt G. Large Language Models for Chatbot Health Advice Studies. JAMA Network Open 2025;8(2):e2457879 View
Yun H, Bickmore T. Online Health Information–Seeking in the Era of Large Language Models: Cross-Sectional Web-Based Survey Study. Journal of Medical Internet Research 2025;27:e68560 View
Langmann E, Henking T, Joos S, Klemmt M, Müller R, Preiser C, Ranisch R, Koch R, Rieger M, Wetzel A, Wiesing U, Ehni H. Handlungsempfehlungen zum Einsatz von Symptom-Checker-Apps im Gesundheitskontext – basierend auf den Ergebnissen aus dem Projekt CHECK.APP. Ethik in der Medizin 2025;37(2):91 View
Tanaka C, Kinoshita T, Okada Y, Satoh K, Homma Y, Suzuki K, Yokobori S, Oda J, Otomo Y, Tagami T. Medical validity and layperson interpretation of emergency visit recommendations by the GPT model: A cross‐sectional study. Acute Medicine & Surgery 2025;12(1) View
Suga T, Uehara O, Abiko Y, Toyofuku A. Evaluating Large Language Models for Burning Mouth Syndrome Diagnosis. Journal of Pain Research 2025;Volume 18:1387 View
Kopka M, von Kalckreuth N, Feufel M. Accuracy of online symptom assessment applications, large language models, and laypeople for self–triage decisions. npj Digital Medicine 2025;8(1) View
Takita H, Kabata D, Walston S, Tatekawa H, Saito K, Tsujimoto Y, Miki Y, Ueda D. A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians. npj Digital Medicine 2025;8(1) View
Morjaria L, Gandhi B, Haider N, Mellon M, Sibbald M. Applications of Generative Artificial Intelligence in Electronic Medical Records: A Scoping Review. Information 2025;16(4):284 View
Akbasli I, Birbilen A, Teksam O. Leveraging large language models to mimic domain expert labeling in unstructured text-based electronic healthcare records in non-english languages. BMC Medical Informatics and Decision Making 2025;25(1) View
Schmieding M, Kopka M, Bolanaki M, Napierala H, Altendorf M, Kuschick D, Piper S, Scatturin L, Schmidt K, Schorr C, Thissen A, Wäscher C, Heintze C, Möckel M, Balzer F, Slagman A. Impact of a Symptom Checker App on Patient-Physician Interaction Among Self-Referred Walk-In Patients in the Emergency Department: Multicenter, Parallel-Group, Randomized, Controlled Trial. Journal of Medical Internet Research 2025;27:e64028 View
Meyer N, Meyer J. A Practical Guide to the Utilization of ChatGPT in the Emergency Department: A Systematic Review of Current Applications, Future Directions, and Limitations. Cureus 2025 View
Alanazi H. Role of artificial intelligence in advancing immunology. Immunologic Research 2025;73(1) View
Zou Y, Ye R, Gao Y, Zhou J, Li Y, Chen W, Zha F, Wang Y. Comparison of triage performance among DRP tool, ChatGPT, and outpatient rehabilitation doctors. Scientific Reports 2025;15(1) View
Shan G, Chen X, Wang C, Liu L, Gu Y, Jiang H, Shi T. Comparing Diagnostic Accuracy of Clinical Professionals and Large Language Models: Systematic Review and Meta-Analysis. JMIR Medical Informatics 2025;13:e64963 View
Giuffrè M, You K, Pang Z, Kresevic S, Chung S, Chen R, Ko Y, Chan C, Saarinen T, Ajcevic M, Crocè L, Garcia-Tsao G, Gralnek I, Sung J, Barkun A, Laine L, Sekhon J, Stadie B, Shung D. Expert of Experts Verification and Alignment (EVAL) Framework for Large Language Models Safety in Gastroenterology. npj Digital Medicine 2025;8(1) View
Wang L, Li J, Zhuang B, Huang S, Fang M, Wang C, Li W, Zhang M, Gong S. Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis. Journal of Medical Internet Research 2025;27:e64486 View
Zhang J, Zhou J, Zhou L, Ba Z. Extracting Multifaceted Characteristics of Patients With Chronic Disease Comorbidity: Framework Development Using Large Language Models. JMIR Medical Informatics 2025;13:e70096 View
Feldman M, Hoffer E, Conley J, Chang J, Chung J, Jernigan M, Lester W, Strasser Z, Chueh H. Dedicated AI Expert System vs Generative AI With Large Language Model for Clinical Diagnoses. JAMA Network Open 2025;8(5):e2512994 View
Yao G, Zhang W, Zhu Y, Wong U, Zhang Y, Yang C, Shen G, Li Z, Gao H. Comparing the accuracy of large language models and prompt engineering in diagnosing realworld cases. International Journal of Medical Informatics 2025;203:106026 View
Шостакович-Корецька Л, Копча В. ВИКОРИСТАННЯ ШТУЧНОГО ІНТЕЛЕКТУ В КЛІНІЧНІЙ МЕДИЦИНІ Й НАУКОВИХ ДОСЛІДЖЕННЯХ. Медична освіта 2025;(1):99 View
Todericiu I. Virtual Assistants: A Review of the Next Frontier in AI Interaction. Acta Universitatis Sapientiae, Informatica 2025;17(1) View
Liang E, Pei S, Staibano P, van der Woerd B. Clinical applications of large language models in medicine and surgery: A scoping review. Journal of International Medical Research 2025;53(7) View
Engberg E, Koch M, Lodefalk M, Schroeder S. Artificial intelligence, tasks, skills, and wages: Worker-level evidence from Germany. Research Policy 2025;54(8):105285 View
Шостакович-Корецька Л. ЗАСТОСУВАННЯ ШТУЧНОГО ІНТЕЛЕКТУ В КЛІНІЧНІЙ МЕДИЦИНІ, НАУКОВИХ ДОСЛІДЖЕННЯХ ТА ОСВІТІ В УМОВАХ ВОЄННОГО СТАНУ. Інфекційні хвороби 2025;(2):71 View
Pankow A, Meißner-Bendzko N, Kaufeld J, Fouquette L, Cotte F, Gilbert S, Türk E, Das A, Terkamp C, Burmester G, Wagner A. Medical Expert Knowledge Meets AI to Enhance Symptom Checker Performance for Rare Disease Identification in Fabry Disease: Mixed Methods Study. JMIR AI 2025;4:e55001 View
Räz T, Pahud De Mortanges A, Reyes M. Explainable AI in medicine: challenges of integrating XAI into the future clinical routine. Frontiers in Radiology 2025;5 View
Gilardi N, Ballabio M, Ravera F, Ferrando L, Stabile M, Bellodi A, Talerico G, Cigolini B, Genova C, Carbone F, Montecucco F, Bracco C, Ballestrero A, Zoppoli G. Influence of medical educational background on the diagnostic quality of ChatGPT‐4 responses in internal medicine: A pilot study. European Journal of Clinical Investigation 2025;55(11) View
Wang T, Jheng J, Tseng Y, Chen L, Chen Y. Evaluating GPT-4’s visual interpretation and clinical reasoning on emergency settings: A 5-year analysis. Journal of the Chinese Medical Association 2025;88(9):672 View
Salehin I, Tomal Ahmed Sajib M, Huda Badhon N, Sakibul Hassan Rifat M, Amin N, Nessa Moon N. Systematic Literature Review of LLM‐Large Language Model in Medical: Digital Health, Technology and Applications. Engineering Reports 2025;7(9) View
Aalam W. Conventional Versus Transepithelial Photorefractive Keratectomy: A Review of Clinical Outcomes. The Open Ophthalmology Journal 2025;19(1) View
Xu L, Zhao W, Huang X. Diagnosis and Triage Performance of Contemporary Large Language Models on Short Clinical Vignettes. Journal of Medical Systems 2025;49(1) View
Aldhafeeri L, Aljumah F, Thabyan F, Alabbad M, AlShahrani S, Alanazi F, Al-Nafjan A. Generative AI Chatbots Across Domains: A Systematic Review. Applied Sciences 2025;15(20):11220 View
Rivera C, Himic V, Zwagerman N, Shah A, Ivan M, Komotar R, Aaronson D. Evaluating Large Language Models in the Image-Based Diagnosis of Intracranial Tumors. Cureus 2025 View
Wong A, Roberts M, Pantangco M, Arnold A, Philp A, Šlapeta J, Livingstone S. When used for veterinary triage, artificial intelligence models recognise emergencies but are more likely than veterinary staff to flag non‐urgent cases as urgent. Veterinary Record 2026;198(2) View
Song P, Tang X, Lv X, Reis R, Chen X, Bai L, Su J. Artificial intelligence‐enabled digital biomedical engineering. BMEMat 2025;3(4) View
Reis F, Agha-Mir-Salim L, Hickstein R, Reis M, Piper S, Balzer F, Boie S. Disclaimers and Referral Patterns for Medical Advice Across Urgency Levels: Large Language Model Evaluation Study. Journal of Medical Internet Research 2026;28:e84668 View
Han Y, Wei J, Wang J, Guo Y, Li S, Ye L. Assessing large language models as assistive tools in selecting first trial lens parameters for orthokeratology. Frontiers in Medicine 2026;13 View
Gu Y, Chen X, Shan G, Tao J, Xia Y, Gu Y, Huang P, Shi T. Development and validation of a deep learning-based emergency triage model: a feasibility and effectiveness study. BMC Emergency Medicine 2026;26(1) View
Chen M, Wu Y, Ma J, Jia X, Gao C, Zhao F, Qiao Y. Independent and collaborative performance of large language models and healthcare professionals in diagnosis and triage. npj Digital Medicine 2026;9(1) View
Kopka M, He L, Feufel M. Evaluating the accuracy of ChatGPT model versions for giving care-seeking advice. Communications Medicine 2026;6(1) View
Mallinar N, Heydari A, Liu X, Faranesh A, Winslow B, Hammerquist N, Graef B, Speed C, Malhotra M, Patel S, Prieto J, McDuff D, Metwally A. A scalable framework for evaluating health language models. npj Digital Medicine 2026 View
Ding C, Bian M, Yuan M, Jiang L, Luo K, Chen P, Jiang Y, Xu J. Advancing medical AI through benchmarking and competition for specialty triage. npj Digital Medicine 2026;9(1) View
Kopka M, Feufel M. Increasing Large Language Model Accuracy for Care-Seeking Advice Using Prompts Reflecting Human Reasoning Strategies in the Real World: Validation Study. JMIR Biomedical Engineering 2026;11:e88053 View
Xu Y, Prentice C, Torres-Rueda S, Meczner A, Multmeier J, Wickham A, Kelly L, Klepchukova A, Stsefanovich H, Zhaunova L. Economic evaluation of a digital symptom checker for endometriosis using a Markov decision process model. npj Digital Medicine 2026;9(1) View
Gao S, Yu M, Zheng Y, Zhang M, Yang Z, Zhang J. Accuracy of the large language model ChatGPT in adult emergency department triage: a systematic review and meta-analysis. BMC Emergency Medicine 2026;26(1) View
Oğuzlar F. Large language models in emergency medicine: a bibliometric study. Journal of Health Sciences and Medicine 2026;9(2):400 View
Alami I, Soboh Q, Taha H, Abu Aisheh M, Abushamma F. Generative Artificial Intelligence as a First Responder in Adolescent Testicular Torsion: A Case Report. Cureus 2026 View

Books/Policy Documents

Carchiolo V, Malgeri M, Sapari L. Management of Digital EcoSystems. View
Reese V, Santare J, Lee R, Maddahi Y, Verges D. Artificial Intelligence in Medicine and Surgery - An Exploration of Current Trends, Potential Opportunities, and Evolving Threats, Volume 3. View
Köbe P. Künstliche Intelligenz im Einsatz für die erfolgreiche Patientenreise. View
Rana H, Gohel S, Patel D, Sharma M, Gandhi N, Suryawanshi M. Artificial Intelligence in Patient Counselling. View
Sundaram S, Shyam A. ICT for Intelligent Systems. View
Taneja K, Walia D, Tyagi A, Singh H, Mourya A. Explainable AI in Clinical Practice. View
Jarial S, Patil N. D. C. Empowering Women in Agriculture through Artificial Intelligence. View

Conference Proceedings

Yun H, Bickmore T. Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. Framing Health Information: The Impact of Search Methods and Source Types on User Trust and Satisfaction in the Age of LLMs View
Zhao S, Wang J. Proceedings of the 34th ACM SIGSOFT International Symposium on Software Testing and Analysis. Best practice for supply chain in LLM-assisted medical applications View
Liu M, Su Y. Proceedings of the Twelfth International Symposium of Chinese CHI. Enhancing Elderly Patients' Decision-Making and Experience in Hospitals through Virtual Agents View
Weerasekara T, Chandeepa C, Amarasooriya O, Hettiarachchi C. 2025 Moratuwa Engineering Research Conference (MERCon). EdgeCare: Privacy-Preserving Medical Advising System on Mobile Devices View
Doughan Z, Hammoud R, Ghalayini M, Darweesh D, Chehade O, Itani S. 2025 Eighth International Conference on Advances in Biomedical Engineering (ICABME). Machine Learning Driven Symptoms Diagnosis System View
Weerasekara T, Chandeepa C, Amarasuriya O, Hettiarachchi C. Proceedings of the ACM/IEEE International Conference on Connected Health: Applications, Systems and Engineering Technologies. Privacy-Preserving Medical Advising System on Mobile Devices: On-Device PHI Anonymization, Medical Report Retrieval, and Cloud-Based RAG View
Wira Santosa K, Huda C. 2025 11th International Conference on Education and Technology (ICET). Enhancing Cancer Information Access with a Retrieval-Augmented Generation Chatbot View
Zhou Z, Liu Y, Xie Y, Wang B, Yang X, Feng Z. Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems. DiagLink: A Dual-User Diagnostic Assistance System by Synergizing Experts with LLMs and Knowledge Graphs View

This paper is in the following e-collection/theme issue:

Comparison of Diagnostic and Triage Accuracy of Ada Health and WebMD Symptom Checkers, ChatGPT, and Physicians for Patients in an Emergency Department: Clinical Data Analysis Study

Comparison of Diagnostic and Triage Accuracy of Ada Health and WebMD Symptom Checkers, ChatGPT, and Physicians for Patients in an Emergency Department: Clinical Data Analysis Study

Journals

Books/Policy Documents

Conference Proceedings