Error: 429: HTTP/2 429 Error: 429: HTTP/2 429 Diagnostic Accuracy and Pitfalls of Publicly Available Artificial Intelligence Models for Nail Disorders - JDDonline - Journal of Drugs in Dermatology

Diagnostic Accuracy and Pitfalls of Publicly Available Artificial Intelligence Models for Nail Disorders

April 2026 | Volume 25 | Issue 4 | 349 | Copyright © April 2026


Published online March 30, 2026

Tanya Boghosian BSa, Haroutyun Joulfayan BSa, Nazar Boghosian BSb, Jacob Beer MDc,d, Aaron J. Russell MDe

aWashington University School of Medicine, St. Louis, MO
bUniversity of California, Los Angeles, Los Angeles, CA
cBeer Dermatology, West Palm Beach, FL
dDepartment of Dermatology and Cutaneous Surgery, University of Miami, Miami, FL
eDepartment of Medicine, Division of Dermatology, Washington University School of Medicine, St. Louis, MO

Abstract
Background: Artificial intelligence is increasingly applied to dermatology, yet its reliability in clinically high-stakes subspecialty domains such as nail disease remains poorly understood. This study evaluated the ability of widely available Vision Language Models (VLMs), ChatGPT-3.5, ChatGPT-4o, and Google Gemini, in diagnosing common nail disorders using clinical images.
Methods: A total of 110 clinical images across 11 nail conditions (10 images per diagnosis) were compiled from peer-reviewed sources and confirmed by a board-certified dermatologist. Disorders included infectious, inflammatory, and neoplastic entities. Each VLM was queried with a standardized prompt requesting the top three differential diagnoses and confidence (1–10 scale) for the top choice.
Results: Across the models, Google Gemini achieved 34% top-1 accuracy and 51% top-3 accuracy, followed by ChatGPT-4o (33%; 51%) and ChatGPT-3.5 (31%; 45%). VLMs performed best on onychomycosis (sensitivity 0.80; specificity 0.63), green nail syndrome (sensitivity 0.83; specificity 0.98), and onychocryptosis (sensitivity 0.50; specificity 1.00). Furthermore, onychomycosis was the most common incorrect guess across misclassified cases, often appearing as a default diagnosis across a wide range of unrelated presentations. Overall, current VLMs demonstrate poor performance in diagnosing common nail disorders, with moderate accuracy for select conditions, such as onychomycosis. Notably, periungual warts, while frequently encountered in clinical practice, were routinely missed across all models.
Conclusions: These findings support the potential role of VLMs as adjunctive tools for pattern recognition but reaffirm that clinical judgment remains essential, especially in cases where diagnostic accuracy has critical implications for patient outcomes.

 

INTRODUCTION

Nail disorders encompass a broad spectrum of infectious, inflammatory, neoplastic, and traumatic conditions that may present with overlapping clinical features. They represent approximately 10% of all dermatologic conditions, accounting for over 21.1 million ambulatory visits in the United States (US) between 2007 and 2016.1,2 While some entities, such as onychomycosis and onychocryptosis, are frequently encountered in clinical practice,1 others, such as nail unit melanoma and nail lichen planus, are rare and diagnostically challenging.3,4 Given their subtle presentations and visual similarity to benign mimics, nail diseases are a known source of diagnostic delay, with potential for significant patient morbidity when malignant or progressive pathology is missed.3 Even among dermatologists, diagnostic delays are common, driven by limited clinical exposure during training and insufficient confidence in performing nail-specific evaluations and procedures.5

Artificial intelligence (AI) has emerged as a promising adjunct in diagnostic medicine. AI-based tools have demonstrated strong performance in specialties such as radiology and ophthalmology, where pattern recognition is central to clinical reasoning.6 In dermatology, AI has been demonstrated to show a comparable accuracy to dermatologists in diagnosing skin cancer.7 Applications now extend to teledermatology triage, decision support, and patient education,8 with public-facing vision-language models (VLMs) like ChatGPT and Gemini broadening access across clinical and consumer settings.9

Despite these advances, most dermatologic AI studies have focused on pigmented skin lesions,7 with limited attention to the nail unit. Nail diseases are structurally and visually distinct from cutaneous lesions, requiring nuanced interpretation of morphology, color, and context. As such, diagnostic accuracy in this domain cannot be assumed based on performance in broader dermatologic tasks. Yet there remains a dearth of