Artificial Intelligence in Interventional Pain: Can ChatGPT Match Fellowship-Level Expertise?: A Comparative Analysis of Accuracy and Readability

Pain Physician. 2026 May;29(3):251-257.

ABSTRACT

BACKGROUND: The transforaminal epidural steroid injection (TFESI) is a widely used interventional procedure for managing radicular pain. Although TFESI is well established as a safe and effective treatment, patients frequently seek detailed explanations regarding its procedural steps, expected outcomes, and potential risks. Artificial intelligence (AI)-based platforms, particularly large language models (LLMs) such as ChatGPT, have emerged as accessible sources of periprocedural medical information. However, the accuracy, readability, and empathy of AI-generated responses in the context of interventional pain management remain uncertain.

OBJECTIVES: To compare the accuracy and readability of responses generated by ChatGPT and fellowship-trained pain medicine physicians to common patient questions about TFESIs and to assess the potential utility of AI in patient education and periprocedural guidance.

STUDY DESIGN: A cross-sectional comparative study.

METHODS: Twenty frequently asked patient questions about TFESIs were retrospectively identified from pain clinic consultations and submitted individually to ChatGPT-4o and to fellowship-level physicians. Two interventional pain specialists independently evaluated all responses for accuracy and empathy using a 5-point Likert scale; discrepancies were resolved by a third reviewer. Readability was analyzed using the Readable® tool kit across 7 indices: Flesch Reading Ease Score (FRES), Flesch-Kincaid Grade Level (FKGRL), Gunning Fog Index, SMOG Index, Coleman-Liau Index, average word and sentence length, and estimated overall reach.

RESULTS: Both sources delivered highly accurate responses. However, ChatGPT’s answers had significantly lower FRES scores, reflecting reduced reading ease, and higher scores across all other readability indices, indicating greater linguistic complexity and lower accessibility. These responses required a higher level of education to understand. Although empathy scores for ChatGPT were lower than the physicians’, the difference was not statistically significant.

LIMITATIONS: This study assessed a single AI platform (ChatGPT-4o). Accuracy and empathy ratings were performed subjectively by 2 pain specialists, which might have limited generalizability. Additionally, AI-generated responses can vary with software updates, reducing reproducibility across time.

CONCLUSION: ChatGPT provides accurate information regarding TFESIs but demonstrates lower readability and a less empathetic tone than answers given by fellowship-trained physicians. With targeted improvements in clarity and patient-centered communication, AI holds potential as a useful adjunct in patient education and clinical support.

PMID:42263306

By Nevin Manimala