Readability, Quality, Understandability, and Actionability of ChatGPT Generated GI Patient Education Versus AGA Patient Center

Dig Dis Sci. 2026 Jul 2. doi: 10.1007/s10620-026-10087-5. Online ahead of print.

ABSTRACT

BACKGROUND AND AIMS: Patients increasingly use the internet and artificial intelligence chatbots to obtain health information, yet the readability, quality, understandability, and actionability of AI-generated gastrointestinal patient education remain unclear. This study compared gastrointestinal patient education from a professional society website with content generated by ChatGPT using validated health literacy instruments.

METHODS: In this cross-sectional comparative study, 50 gastrointestinal patient education topics from the American Gastroenterological Association patient information website were paired with ChatGPT-generated responses using standardized prompts. Readability was assessed using the Flesch-Kincaid Grade Level, quality of treatment information was evaluated using the DISCERN instrument, and understandability and actionability were assessed using the Patient Education Materials Assessment Tool; scoring was performed by two blinded reviewers. Paired t tests were used to compare mean scores between sources, and intraclass correlation coefficients (ICCs) were used to assess interrater reliability between reviewers.

RESULTS: Fifty paired topics were analyzed. The mean Flesch-Kincaid Grade Level was higher for ChatGPT than GI website materials (10.33 ± 1.5 vs 8.72 ± 1.7; mean difference, 1.61; P < .001). Differences in DISCERN scores (63.5 ± 5.7 vs 64.3 ± 5.4; mean difference, – 0.8), PEMAT understandability (87.9% ± 6.9% vs 86.5% ± 7.8%; mean difference, 1.4%; P = .33), and PEMAT actionability (78.6% ± 9.8% vs 77.9% ± 10.2%; mean difference, 0.6%; P = .73) were not statistically significant. Inter-rater reliability was excellent across all measures, with intraclass correlation coefficients of 0.97 (95% CI, 0.95-0.99) for PEMAT understandability, 0.96 (95% CI, 0.94-0.98) for PEMAT actionability, and 0.99 (95% CI, 0.98-0.99) for DISCERN.

CONCLUSION: ChatGPT-generated gastrointestinal patient education demonstrated similar quality, understandability, and actionability compared with professional society materials but was written at a significantly higher reading level. Improving readability may enhance accessibility and support the safe integration of AI-generated patient education.

PMID:42390721 | DOI:10.1007/s10620-026-10087-5

By Nevin Manimala