New Atrophic Acne Scar Classification: Reliability of Assessments Based on Size, Shape, and Number
June 2016 | Volume 15 | Issue 6 | Original Article | 693 | Copyright © June 2016
Sewon Kang MD,a Vicente Torres Lozada MD,b Vincenzo Bettoli MD,c Jerry Tan MD,d Maria Jose Rueda MD,e Alison Layton MB ChB,f Lauren Petit BS,g and Brigitte Dréno MD PhDh
aJohns Hopkins School of Medicine, Baltimore, MD
bJuarez Hospital, Mexico City, Mexico
cUniversity of Ferrara, Ferrara, Italy
dUniversity of Western Ontario, Windsor, Ontario, Canada
eGalderma, Fort Worth, TX
fHarrogate District Hospital, Harrogate, Germany
gGalderma International, Sophia Antipolis, France
hHotel Dieu, Nantes, France
OBJECTIVES: Evaluate classification for atrophic acne scars by shape, size, and facial location and establish reliability in assessments.
METHODS: We conducted a non-interventional study with dermatologists performing live clinical assessments of atrophic acne scars. To objectively compare identification of lesions, individual lesions were marked on a high-resolution photo of the patient that was displayed on a computer during the clinical evaluation. The Jacob clinical classification system was used to define three primary shapes of scars 1) icepick, 2) boxcar, and 3) rolling. To determine agreement for classification by size, independent technicians assessed the investigators’ markings on digital images. Identical localization of scars was denoted if the maximal distance between their centers was ≤ 60 pixels (approximately 3 mm). Raters assessed scars on the same patients twice (morning/afternoon). Aggregate models of rater assessments were created and analyzed for agreement.
RESULTS: Raters counted a mean scar count per subject ranging from 15.75 to 40.25 scars. Approximately 50% of scars were identified by all raters and ~75% of scars were identified by at least 2 of 3 raters (weak agreement, Kappa pairwise agreement 0.30). Agreement between consecutive counts was moderate, with Kappa index ranging from 0.26 to 0.47 (after exclusion of one outlier investigator who had significantly higher counts than all others). Shape classifications of icepick, boxcar, and rolling differed significantly between raters and even for same raters at consecutive sessions (P<.001 and P=0.4, respectively). Analysis showed only 65% of scars were identical in both sessions. We also found that there is a threshold of detection in terms of size, with poor agreement among investigators for very small scars (<2 mm). The repeatability of identification of scars ≥ 2.0 mm was acceptable, and we found that increasing scar size was positively correlated with agreement. Reliability was improved when only scars >2 mm were included. For smaller scars (<2 mm), inter-rater reliability was poor.
CONCLUSIONS: While intuitively it makes sense that describing scar morphology could guide treatment, we have shown that shape-based evaluations are subjective and do not readily yield strong agreement. Until there is a more objective way to evaluate morphology that is readily available to practicing clinicians, we propose that size should be considered a primary characteristic for scar classification systems. We further suggest classification of <2 mm, 2-4 mm, and >4 mm based on how the size would likely affect diagnostic and therapeutic choices. Finally, we recommend that scars <2 mm not be included in a clinical classification but should be evaluated by an objective method that may be refined in the future.
J Drugs Dermatol. 2016;15(6):693-702.