INTRODUCTION
ChatGPT (OpenAI), a widely used large language model (LLM), was publicly released prior to the 2024 dermatology residency application cycle, allowing applicants access to an advanced tool capable of generating or enhancing their personal statements in a human-like manner.1 Personal statements are one way residency programs identify applicants for interviews.2 However, studies in specialties such as anesthesiology and plastic surgery suggest that program directors have difficulties differentiating between AI-generated and human-written essays, which raises questions about the use of LLMs in applications.3,4 This manuscript investigates trends in LLM AI usage in personal statements to dermatology residency and explores ethical considerations of transparency in the application process.
MATERIALS AND METHODS
We deidentified and compiled dermatology personal statements submitted through ERAS during the 2022, 2023, and 2024 application cycles. We used AI to generate positive control essays using the version of ChatGPT available to applicants during their application cycle: ChatGPT-3 for 2022 and ChatGPT-3.5 for 2023 and 2024. Both models were prompted with, "Create a personal statement for someone applying to dermatology residency incorporating a person's life experiences." Personal statements known to be human-written from prior application cycles were used as negative controls.
We analyzed the applicant's personal statements (n=1500), AI-generated positive controls (n=20), and human-written negative controls (n=9) using 3 AI detection software: ZeroGPT, Quillbot, and Scribbr.5,6,7 These tools were selected due to their reported accuracy and availability at no cost. ZeroGPT and Quillbot reported the percentage of text that was AI-generated and Scribbr reported the percent chance that the text was AI-generated.
To attain a broader understanding of the data and differences between application cycles, multiple statistical analyses were conducted in Minitab. To explore if the application cycle influenced AI detection scores across each detection software,
We analyzed the applicant's personal statements (n=1500), AI-generated positive controls (n=20), and human-written negative controls (n=9) using 3 AI detection software: ZeroGPT, Quillbot, and Scribbr.5,6,7 These tools were selected due to their reported accuracy and availability at no cost. ZeroGPT and Quillbot reported the percentage of text that was AI-generated and Scribbr reported the percent chance that the text was AI-generated.
To attain a broader understanding of the data and differences between application cycles, multiple statistical analyses were conducted in Minitab. To explore if the application cycle influenced AI detection scores across each detection software,