JAMA Netw Open. 2026 May 1;9(5):e2616556. doi: 10.1001/jamanetworkopen.2026.16556.
ABSTRACT
IMPORTANCE: High-quality discharge summaries are essential for safe care transitions but contribute substantially to clinician documentation burden and burnout. While retrospective studies suggest that large language models (LLMs) can generate clinical summaries of comparable quality to those by physicians, prospective data on their safety, utility, and association with clinician well-being in clinical environments are lacking.
OBJECTIVE: To evaluate the safety, use, and association with clinician burden of MedAgentBrief, an LLM-based agentic workflow for generating hospital course summaries, during prospective clinical deployment.
DESIGN, SETTING, AND PARTICIPANTS: This single-arm prospective pilot quality improvement study encompassed hospital discharges at 1 academic inpatient medicine unit from August 1 to October 11, 2025, with baseline comparisons drawn from April 9 to July 31, 2025.
INTERVENTION: A custom agentic LLM workflow using Gemini 2.5 Pro generated draft hospital course summaries nightly using patient history and physical and daily progress notes. Drafts were securely emailed to physicians daily for review and optional use.
MAIN OUTCOMES AND MEASURES: The primary outcome was physician-reported potential for and severity of harm from unedited summaries (Agency for Healthcare Research and Quality Common Format Harm Scale). Secondary outcomes included use rate, error types (omissions, inaccuracies, and hallucinations), time spent in discharge summaries (electronic health record logs), and changes in cognitive burden (NASA Task Load Index; score range, 0-100, with higher scores indicating greater cognitive burden) and burnout (Stanford Professional Fulfillment Index Work Exhaustion Scale; score range, 0-4, with higher scores indicating greater burnout).
RESULTS: Among 384 hospital discharges, the system generated 1274 summaries. Physicians used artificial intelligence (AI) content in 219 cases (57.0%). Feedback on 100 summaries (88 of 219 used summaries [40.2%] and 12 of 165 unused summaries [7.3%]) noted omissions (25 summaries [25.0%]) and inaccuracies (20 summaries [20.0%]) but rare hallucinations (2 summaries [2.0%]). Physicians rated 88 unedited summaries (88.0%) as having no harm potential and 1 (1.0%) as likely to cause moderate harm; no severe harm was reported. Mean physician burnout scores decreased significantly from before to after the intervention (1.75; 95% CI, 1.16-2.34 vs 1.20; 95% CI, 0.71-1.69; P = .03). Time savings were heterogeneous, with 5 of 7 physicians with matched baseline data (71.4%) seeing reductions in median documentation time; changes from baseline to pilot were up to 2.9 minutes, which was a nonsignificant difference (10.7 minutes; 95% CI, 7.4-13.3 minutes vs 7.8 minutes; 95% CI, 5.1-11.7 minutes; P = .13).
CONCLUSIONS AND RELEVANCE: In this study, an LLM-based agentic workflow produced hospital course summaries that were frequently used with minimal risk of harm identified. The intervention was associated with a reduction in physician burnout, supporting the viability of AI summarization to mitigate documentation burden.
PMID:42101844 | DOI:10.1001/jamanetworkopen.2026.16556