Indian J Orthop. 2025 Jun 11;59(9):1413-1419. doi: 10.1007/s43465-025-01430-5. eCollection 2025 Sep.
ABSTRACT
INTRODUCTION: On-the-go (OTG) reference is defined as the reference that clinicians make on the go in a narrow time frame during patient care, which tends to play a significant role in decision-making. The common sources of these references over the years, have been changing from hard copies of textbooks and journals to online platforms. With the introduction of artificial intelligence (AI) based large language model (LLM) platforms, they are now being relied upon for these OTG references to provide clinicians with the necessary facts to make good clinical decisions by analyzing the maximum available resources. This study aims to compare the answers given by various LLM platforms with the answers obtained by clinicians using conventional referencing and also grade the relevance of the answers provided by these platforms.
METHODS: Three commonly used AI-based LLM chat platforms: ChatGPT Version 4(GPT 4), Microsoft Bing chat and Google Bard were selected for the study. 250 OTG clinical queries were collated from orthopaedic practitioners along with their answers and references used. The queries were given to the LLMs and their answers were compared and graded with the human answers for relevance and level of evidence (LOE) of the reference cited to support their answers.
RESULTS: We did not find any significant difference between the AI-LLM models tested regarding the relevance of generated answers to the clinical queries raised (p = 0.110). ChatGPT answers were significantly better to queries that necessitated numerical answers (p = 0.006) while Bard (p = 0.503) and Bing (p = 0.545) did not differ in their performance based on query types. We noted a statistically significant difference concerning the LOE of answers obtained (p < 0.001). Upon ranking the three LLMs with the LOE of human references, human references ranked the best followed by Bing, ChatGPT, and Bard.
CONCLUSION: Of the three compared tools, Bing Chat used relatively better LOE in answering OTG questions. All three compared AI-LLM tools show promising results concerning OTG referencing. We propose that customization to the medical domain and regulatory policies are needed before their recommended use.
SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s43465-025-01430-5.
PMID:41054750 | PMC:PMC12496309 | DOI:10.1007/s43465-025-01430-5