Medical Record Abstraction for Quality Improvement in Sepsis Care Using Artificial Intelligence: A Cluster Randomized Trial

JAMA Netw Open. 2026 Jun 1;9(6):e2611885. doi: 10.1001/jamanetworkopen.2026.11885.

ABSTRACT

IMPORTANCE: Hospital quality reporting remains a manual, costly process with critical limitations as a mechanism to improve care outcomes.

OBJECTIVE: To assess whether near-real-time quality measurement, enabled by large language models (LLMs), can improve quality performance as measured by the Centers for Medicare & Medicaid Services (CMS) Severe Sepsis and Septic Shock Management Bundle (SEP-1) quality metric.

DESIGN, SETTING, AND PARTICIPANTS: This single-blind, unstratified, cluster randomized trial was conducted between December 13, 2024, and July 8, 2025, at 2 academic emergency departments (EDs) within the University of California, San Diego (UCSD) health system. Participants included all 66 attending physicians who practiced in the UCSD EDs and worked more than 3 shifts per month prior to study initiation.

INTERVENTION: Participants were randomized to receive targeted feedback from LLM-determined compliance with SEP-1 at the time of patient discharge or standard process.

MAIN OUTCOMES AND MEASURES: The primary outcome was overall compliance with SEP-1. Secondary outcomes included expert agreement with the LLM SEP-1 determination, 30-day mortality, and intensive care unit admissions of patients with severe sepsis and/or septic shock in the ED. Effect sizes were estimated from a mixed-effects logistic regression model with the intervention group as a fixed effect and a random intercept for physician.

RESULTS: The study population included 66 physicians who treated 301 patients (121 in the control group and 180 in the intervention group; median age, 64.3 [IQR, 51.1-75.7] years; 171 [56.8%] male; 52 [17.3%] with chronic kidney disease; 52 [17.3%] with chronic heart failure) who met CMS inclusion criteria for SEP-1. Physicians in the control group had a SEP-1 compliance rate of 70.1%, while those in the intervention group had a rate of 82.9%. Assignment to the intervention group resulted in a 13.0% absolute improvement in SEP-1 compliance (95% CI, 2.5%-23.4%; odds ratio, 2.10 [95% CI, 1.15-3.81]; P = .02) in the mixed-effects model. The largest difference between the intervention group and control group was in noncompletion of the 30-mL/kg fluid bolus component (3 of 180 [1.7%] vs 16 of 121 [13.2%]), a documentation-sensitive component of the quality measure. Agreement between LLM determination and expert review was 92%. No significant differences existed in intensive care unit admissions or 30-day mortality.

CONCLUSIONS AND RELEVANCE: In this cluster randomized trial of artificial intelligence (AI)-enabled medical record abstraction for sepsis care, rapid assessment of SEP-1 performance and targeted feedback improved overall compliance with the measure. AI-driven quality clinical integration may address limitations in existing hospital quality reporting and better support a learning health system.

TRIAL REGISTRATION: ClinicalTrials.gov Identifier: NCT07581340.

PMID:42348212 | DOI:10.1001/jamanetworkopen.2026.11885

By Nevin Manimala