INTRODUCTION
Patients increasingly rely on the Internet for health information, medical advice, telemedicine, and treatment. Given the vast and ever-expanding repository of online patient education materials (PEMs), ongoing oversight is essential, as misinformation or misinterpretation can have serious consequences.1
Health literacy plays a critical role in patient outcomes, with lower literacy levels consistently associated with poorer health outcomes. The National Institutes of Health (NIH) and Centers for Disease Control and Prevention (CDC) recommend that patient education materials (PEMs) be written at or below an eighth-grade reading level.2,3 Despite these guidelines, many existing PEMs remain inaccessible to the average reader, often due to complex medical terminology, dense formatting, and other readability barriers.4,5
In this study, we assessed the ability of generative artificial intelligence (AI) to revise commonly available online materials for psoriasis (PsO) and hidradenitis suppurativa (HS), 2 prevalent dermatologic conditions, so they align with the recommended reading level.
MATERIALS AND METHODS
PEMs for HS and PsO were collected from the top 3 hospitals in each northeastern US state, based on the 2024 to 2025 US News and World Report Best Hospitals regional rankings. The northeast region was defined to include Connecticut, Maine, Massachusetts, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, and Vermont. In cases where multiple hospitals were tied for a top-three ranking, all tied institutions were included. Tied hospitals were considered a single entity, and the next highest-ranked hospitals were also included, even if they were not technically in the top three positions. For example, in New York, four hospitals were tied for first place, and all were included, along with additional hospitals ranked lower to maintain the top-three representation after accounting for ties.
PEMs were identified by searching each hospital’s website for information related to HS and PsO. If no relevant PEMs were available, the hospital was excluded. The PEMs were converted to a text-only format, excluding any audiovisual multimedia. Readability was calculated using validated readability formulates: Flesch-Kincaid Reading Ease Score, Gunning Fog, Flesch-Kincaid Grade Level, Coleman-Liau Index, Simple Measure of Gobbledygook (SMOG) Index, Automated Readability Index, and Linsear Write Formula.6
ChatGPT-4 was used to rewrite the PEMs. The original texts were uploaded, and the following modifications were applied: (1) limit the total number of polysyllabic words to less than 30, (2) limit sentences to less than 10 words, (3) limit paragraphs to less than 5 sentences, (4) eliminate as much medical jargon without compromising accuracy, (5) when eliminating medical jargon is not possible, provide a brief explanation of the relevant concept, and (6) overall, rewrite this as if you were speaking to an eighth grader. These parameters were based on recommendations by the NIH and CDC.2,3,7 Two independent authors reviewed all modified PEMs produced by ChatGPT to ensure content validity.
The modified PEMs were then reassessed using the aforementioned readability metrics. The average scores for each readability tool were correlated with grade level to determine the mean change in readability before and after ChatGPT modification, and paired t-tests evaluated the significance of this change.
RESULTS
At baseline, the majority of PEMs for both PsO and HS exceeded the recommended eighth-grade reading level across nearly all assessed readability metrics. For PsO, average readability scores ranged from 6.21 to 11.25, with most metrics indicating reading levels above the eighth-grade threshold. Notably, only the SMOG Index and Linsear Write Grade Level Formula yielded averages at or below the recommended level (Table 1). For HS, average baseline readability scores ranged from 7.82 to 12.58. Nearly all readability metrics exceeded the recommended level, except the Linsear Write Grade Level Formula (Table 2). Average and standard deviation statistics for each readability metric are summarized.