ChatGPT and DeepSeek in Dermatologic Practice: Do Artificial Intelligence Models Adhere to Psoriasis Treatment Guidelines?

February 2026 | Volume 25 | Issue 2 | 133 | Copyright © February 2026


Published online January 31, 2026

Akpolat Nebahat Demet MDa, Emre Kaynak MDb, Metin Ahmet MDc

aClinic of Demet Akpolat, İstanbul, Turkey
bClinic of Emre Kaynak, İstanbul, Turkey
cPamukkale University, Denizli, Turkey

Abstract
Background: The integration of artificial intelligence (AI), particularly large language models (LLMs), into healthcare has rapidly expanded. In dermatology, machine learning is increasingly employed for diagnosis, prognosis, and treatment planning. However, the reliability of AI-generated recommendations, especially in chronic conditions such as psoriasis, remains insufficiently explored.
Objective: This study aims to evaluate the alignment of two prominent LLMs—ChatGPT and DeepSeek—with the clinical guidelines for psoriasis treatment issued by the American Academy of Dermatology (AAD).
Methods: Thirty-one guideline-based questions were formulated using the 2021 AAD-NPF guidelines. Each question was presented to ChatGPT-4 and DeepSeek. Responses were evaluated by two board-certified dermatologists blinded to the model source and rated as either concordant or discordant with the guidelines. Discrepancies were resolved by a third reviewer. Concordance rates were calculated according to the strength of recommendations, and inter-model agreement was assessed using Cohen’s kappa coefficient.
Results: Both ChatGPT and DeepSeek demonstrated an overall concordance rate of 87.1%. While both models fully aligned with strong recommendations, ChatGPT showed higher concordance with moderate recommendations (87.5% vs 81.3%), whereas DeepSeek outperformed ChatGPT in scenarios involving limited recommendations (66.7% vs 33.3%). Inter-model agreement was moderate
(κ = 0.43).
Conclusion: Although ChatGPT and DeepSeek show promise in aligning with evidence-based dermatologic care, inconsistencies, particularly in cases with limited evidence, highlight the necessity of clinical oversight. AI may serve as a valuable adjunct in psoriasis management; however, its safe integration into practice requires careful validation and context-sensitive application.

 

INTRODUCTION

Machine learning (ML), a subset of artificial intelligence (AI), is a computational approach that utilizes input data to achieve specific goals without relying on explicitly programmed instructions.1 Large language models (LLMs), designed to interpret, generate, and refine human language, are becoming increasingly prevalent in healthcare due to their ability to support clinical decision-making. In dermatology, ML has been applied to diagnostic imaging to assist in identifying skin conditions such as melanoma, psoriasis, and eczema.2-5 It has also been employed to predict treatment responses, optimize personalized treatment plans, and improve patient outcomes.

With growing reliance on online health information, an estimated 80% of internet users in the United States seek medical guidance through digital platforms.2 As AI-powered chatbots gain prominence, understanding the accuracy of the information they provide becomes critical, especially for chronic conditions like psoriasis.4 Recent studies investigating the reliability of AI-generated medical advice have raised concerns regarding the accuracy and trustworthiness of such responses.6-9 For instance, while LLMs like ChatGPT and DeepSeek offer accessible health-related insights, research indicates that they may rely on non-evidence-based sources, heightening concerns about misinformation.

This study aims to evaluate the accuracy of psoriasis treatment recommendations provided by two widely used LLMs, ChatGPT and DeepSeek, compared to the clinical guidelines of the American Academy of Dermatology (AAD). We hypothesize that these AI models may not consistently align with evidence-based recommendations and that AI-generated medical guidance in dermatology must be critically assessed.