INTRODUCTION
Machine learning (ML), a subset of artificial intelligence (AI), is a computational approach that utilizes input data to achieve specific goals without relying on explicitly programmed instructions.1 Large language models (LLMs), designed to interpret, generate, and refine human language, are becoming increasingly prevalent in healthcare due to their ability to support clinical decision-making. In dermatology, ML has been applied to diagnostic imaging to assist in identifying skin conditions such as melanoma, psoriasis, and eczema.2-5 It has also been employed to predict treatment responses, optimize personalized treatment plans, and improve patient outcomes.
With growing reliance on online health information, an estimated 80% of internet users in the United States seek medical guidance through digital platforms.2 As AI-powered chatbots gain prominence, understanding the accuracy of the information they provide becomes critical, especially for chronic conditions like psoriasis.4 Recent studies investigating the reliability of AI-generated medical advice have raised concerns regarding the accuracy and trustworthiness of such responses.6-9 For instance, while LLMs like ChatGPT and DeepSeek offer accessible health-related insights, research indicates that they may rely on non-evidence-based sources, heightening concerns about misinformation.
This study aims to evaluate the accuracy of psoriasis treatment recommendations provided by two widely used LLMs, ChatGPT and DeepSeek, compared to the clinical guidelines of the American Academy of Dermatology (AAD). We hypothesize that these AI models may not consistently align with evidence-based recommendations and that AI-generated medical guidance in dermatology must be critically assessed.
With growing reliance on online health information, an estimated 80% of internet users in the United States seek medical guidance through digital platforms.2 As AI-powered chatbots gain prominence, understanding the accuracy of the information they provide becomes critical, especially for chronic conditions like psoriasis.4 Recent studies investigating the reliability of AI-generated medical advice have raised concerns regarding the accuracy and trustworthiness of such responses.6-9 For instance, while LLMs like ChatGPT and DeepSeek offer accessible health-related insights, research indicates that they may rely on non-evidence-based sources, heightening concerns about misinformation.
This study aims to evaluate the accuracy of psoriasis treatment recommendations provided by two widely used LLMs, ChatGPT and DeepSeek, compared to the clinical guidelines of the American Academy of Dermatology (AAD). We hypothesize that these AI models may not consistently align with evidence-based recommendations and that AI-generated medical guidance in dermatology must be critically assessed.





