
Chatbots in the Waiting Room: How Reliable Are They for Dermatology Patients?
Four consumer chatbots (ChatGPT 4o mini, Microsoft Copilot, Google Gemini Flash 1.5, and Perplexity) were asked about 25 common dermatologic conditions and their answers were scored for accuracy, quality, readability, and misinformation against AAD patient resources. Overall accuracy was acceptable, but quality was only moderate: Copilot and Perplexity led, Gemini lagged, and ChatGPT gave the most thorough (and most verbose) responses while often not citing sources.
Chatbots tended to omit comparative treatment details, and their output sits at about a 10th‑grade reading level (well above the ~6th‑grade level many adults prefer), which risks misunderstanding. Reproducibility and timeliness are real limits: answers vary between sessions and may lag behind rapidly evolving evidence.
Clinically, chatbots can be a helpful first stop, but they don’t replace tailored counseling or vetted patient materials. Want the condition‑level scores, methodology, and practical talking points to use in the clinic? Read the full paper for the data and suggested clinician messaging.
J Drugs Dermatol. 2025;24(10) doi:10.36849/JDD.9100
Blog write-up assisted by AI