Categories
Nevin Manimala Statistics

Improving the Readability of Institutional Heart Failure-Related Patient Education Materials Using GPT-4: Observational Study

JMIR Cardio. 2025 Jul 8;9:e68817. doi: 10.2196/68817.

ABSTRACT

BACKGROUND: Heart failure management involves comprehensive lifestyle modifications such as daily weights, fluid and sodium restriction, and blood pressure monitoring, placing additional responsibility on patients and caregivers, with successful adherence often requiring extensive counseling and understandable patient education materials (PEMs). Prior research has shown PEMs related to cardiovascular disease often exceed the American Medical Association’s fifth- to sixth-grade recommended reading level. The large language model (LLM) ChatGPT may be a useful tool for improving PEM readability.

OBJECTIVE: We aim to assess the readability of heart failure-related PEMs from prominent cardiology institutions and evaluate GPT-4’s ability to improve these metrics while maintaining accuracy and comprehensiveness.

METHODS: A total of 143 heart failure-related PEMs were collected from the websites of the top 10 institutions listed on the 2022-2023 US News & World Report for “Best Hospitals for Cardiology, Heart & Vascular Surgery.” PEMs were individually entered into GPT-4 (version updated July 20, 2023), preceded by the prompt, “Please explain the following in simpler terms.” Readability was assessed using the Flesch Reading Ease score, Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index, Coleman-Liau Index, Simple Measure of Gobbledygook Index, and Automated Readability Index. The accuracy and comprehensiveness of revised GPT-4 PEMs were assessed by a board-certified cardiologist.

RESULTS: For 143 institutional heart failure-related PEMs analyzed, the median FKGL was 10.3 (IQR 7.9-13.1; high school sophomore) compared to 7.3 (IQR 6.1-8.5; seventh grade) for GPT-4’s revised PEMs (P<.001). Of the 143 institutional PEMs, there were 13 (9.1%) below the sixth-grade reading level, which improved to 33 (23.1%) after revision by GPT-4 (P<.001). No revised GPT-4 PEMs were graded as less accurate or less comprehensive compared to institutional PEMs. A total of 33 (23.1%) GPT-4 PEMs were graded as more comprehensive.

CONCLUSIONS: GPT-4 significantly improved the readability of institutional heart failure-related PEMs. The model may be a promising adjunct resource in addition to care provided by a licensed health care professional for patients living with heart failure. Further rigorous testing and validation is needed to investigate its safety, efficacy, and impact on patient health literacy.

PMID:40627825 | DOI:10.2196/68817

By Nevin Manimala

Portfolio Website for Nevin Manimala