Search Articles

View query in Help articles search

Search Results (1 to 10 of 3232 Results)

Download search results: CSV END BibTex RIS


Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study

Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study

Our study corroborates GPT-4’s strong performance, particularly in psychiatry, where GPT-4o achieved 84.4% accuracy. However, our findings suggest that more cautious interpretation is needed, given the high confidence levels observed for incorrect answers. Xiong et al’s [17] work on LLM confidence elicitation aligns with our observations of overconfidence.

Mahmud Omar, Reem Agbareia, Benjamin S Glicksberg, Girish N Nadkarni, Eyal Klang

JMIR Med Inform 2025;13:e66917

A Brief Video-Based Intervention to Improve Digital Health Literacy for Individuals With Bipolar Disorder: Intervention Development and Results of a Single-Arm Quantitative Pilot Study

A Brief Video-Based Intervention to Improve Digital Health Literacy for Individuals With Bipolar Disorder: Intervention Development and Results of a Single-Arm Quantitative Pilot Study

Participant recruitment occurred via promotion on CREST.BD social media pages, paid advertisements on Facebook, Instagram, and Twitter, emails to the CREST.BD mailing list, and health care providers or organizations associated with the CREST.BD network (eg, Hope+Me, a Toronto-based community organization offering peer support and counseling; Bipolar Support Club International, an online, peer-led organization offering support and education; and the John Hopkins Bipolar Disorder clinic, an academic psychiatry

Emma Morton, Sahil S Kanani, Natalie Dee, Rosemary Xinhe Hu, Erin E Michalak

J Particip Med 2025;17:e59806