Evaluating ChatGPT responses on obstructive sleep apnea for patient education

J Clin Sleep Med. 2023 Jul 24. doi: 10.5664/jcsm.10728. Online ahead of print.

ABSTRACT

STUDY OBJECTIVES: Evaluate the quality of ChatGPT responses to questions on obstructive sleep apnea (OSA) for patient education. Assess how prompting the chatbot influences correctness, estimated grade level, and references of answers.

METHODS: ChatGPT was queried four times with 24 identical questions. Queries differed by initial prompting: no prompting, patient-friendly prompting, physician-level prompting, and prompting for statistics/references. Answers were scored on a hierarchical scale: incorrect, partially correct, correct, correct with either statistic or referenced citation (“correct+”), or correct with both a statistic and citation (“perfect”). Flesch-Kincaid (FK) grade level and citation publication years were recorded for answers. Proportions of responses at incremental score thresholds were compared by prompt type using chi-squared analysis. The relationship between prompt type and grade level was assessed using ANOVA.

RESULTS: Across all prompts (n=96 questions), 69 answers (71.9%) were at least correct. Proportions of responses that were at least partially correct (p=0.387) or correct (p=0.453) did not differ by prompt; responses that were at least correct+ (p<0.001) or perfect (p<0.001) did. Statistics/references prompting provided 74/77 (96.1%) references. Responses from patient-friendly prompting had a lower mean grade level (12.45 ± 2.32) than no prompting (14.15 ± 1.59), physician-level prompting (14.27 ± 2.09), and statistics/references prompting (15.00 ± 2.26) (p<0.0001).

CONCLUSIONS: ChatGPT overall provides appropriate answers to most questions on OSA regardless of prompting. While prompting decreases response grade level, all responses remained above accepted recommendations for presenting medical information to patients. Given ChatGPT’s rapid implementation, sleep experts may seek to further scrutinize its medical literacy and utility for patients.

PMID:37485676 | DOI:10.5664/jcsm.10728

By Nevin Manimala