Obes Surg. 2025 Aug 1. doi: 10.1007/s11695-025-08115-w. Online ahead of print.
ABSTRACT
BACKGROUND: Large language models (LLMs) can generate human-like, empathetic responses within seconds. Their potential in terms of comprehensibility, empathy, and completeness to support physician-patient communication in bariatric surgery care needs to be evaluated.
METHODS: We collected 200 real-world questions from patient support groups, initial consultations, and follow-up visits, which were answered by GPT-4o and two human bariatric experts. An independent bariatric expert then blindly evaluated the responses for their overall quality, accuracy, and comprehensiveness. If needed, the responses were corrected, and the correction time was documented. Afterwards, bariatric patients (n = 189) across Germany rated the responses, assessing each one on its clarity, empathy, and completeness.
RESULTS: The LLM required significantly less time (2.7 vs. 87.2 s, p < 0.0001) and generated longer responses (607 vs. 262 characters, p = 0.001) than human experts. LLM-generated responses were rated significantly higher by patients in terms of clarity (4.8 vs. 4.6), completeness (4.5 vs. 3.4), and empathy (4.1 vs. 3.2, all p < 0.0001). In total, 64.9% of patients preferred LLM-generated responses, while 18.5% preferred physician responses. Notably, patients with a lower degree of education showed a stronger preference for LLM responses over physician responses.
CONCLUSION: LLMs could possibly act as an assistant for physicians and help improve their response efficiency while maintaining accuracy under physicians’ oversight. This approach could optimize physician time management and enhance patient satisfaction in bariatric care communication.
PMID:40748576 | DOI:10.1007/s11695-025-08115-w