Categories
Nevin Manimala Statistics

Nursing judgment in the age of generative artificial intelligence: A cross-national study on clinical decision-making performance among emergency nurses

Int J Nurs Stud. 2025 Sep 12;172:105216. doi: 10.1016/j.ijnurstu.2025.105216. Online ahead of print.

ABSTRACT

BACKGROUND: Clinical decision-making is a core competency in emergency nursing, requiring rapid and accurate assessments. With the growing integration of Generative Artificial Intelligence in healthcare, there is a pressing need to understand its potential as a clinical decision support tool. While Generative Artificial Intelligence models show high accuracy and consistency, their ability to navigate complex, context-sensitive scenarios remains in question.

OBJECTIVES: This study aimed to compare the clinical decision-making performance of emergency nurses from Israel and Italy with Generative Artificial Intelligence models (Claude-3.5, ChatGPT-4.0, and Gemini-1.5). It evaluated differences in severity assessment, hospitalization decisions, and test selection, while exploring the influence of demographic and professional characteristics on decision accuracy.

METHODS: A prospective observational study was conducted among 82 emergency nurses (49 from Italy, 33 from Israel), each independently evaluating five standardized clinical cases. Their decisions were compared with those generated by Generative Artificial Intelligence models using a structured evaluation rubric. Statistical analyses included ANOVA, chi-square tests, logistic regression, and receiver operating characteristic curve analysis to assess predictive accuracy.

RESULTS: Generative Artificial Intelligence models exhibited higher overall decision accuracy and stronger alignment with expert recommendations. However, notable discrepancies emerged in hospitalization decisions and severity assessments. For example, in Case 2, Generative Artificial Intelligence rated severity as level 1, while Italian and Israeli nurses rated it at 1.98 and 2.23, respectively (P < 0.01, F = 199). In Case 1, only 4.1 % of Italian nurses recommended hospitalization compared to 30.3 % of Israeli nurses, whereas all Generative Artificial Intelligence models advised hospitalization. Nurses showed greater variability in test selection and severity judgments, reflecting their use of clinical intuition and contextual reasoning. Demographics such as age, gender, and years of experience did not significantly predict accuracy.

CONCLUSIONS: Generative Artificial Intelligence models demonstrated consistency and expert alignment but lacked the contextual sensitivity vital in emergency care. These results highlight the potential of Generative Artificial Intelligence as a clinical decision-support tool while emphasizing the continued importance of human clinical judgment.

PMID:40975905 | DOI:10.1016/j.ijnurstu.2025.105216

By Nevin Manimala

Portfolio Website for Nevin Manimala