Evaluation of the performance of ChatGPT-4 and ChatGPT-4o as a learning tool in endodontics

Int Endod J. 2025 Mar 2. doi: 10.1111/iej.14217. Online ahead of print.

ABSTRACT

AIMS: The aim of this study was to evaluate the accuracy and consistency of responses given by two different versions of Chat Generative Pre-trained Transformer (ChatGPT), ChatGPT-4, and ChatGPT-4o, to multiple-choice questions prepared from undergraduate endodontic education topics at different times of the day and on different days.

METHODOLOGY: In total, 60 multiple-choice, text-based questions from 6 topics of undergraduate endodontic education were prepared. Each question was asked to ChatGPT-4 and ChatGPT-4o 3 times a day (morning, noon, and evening) and for 3 consecutive days. The accuracy and consistency of AIs were compared using SPSS and R programs (p < .05, 95% confidence interval).

RESULTS: The accuracy rate of ChatGPT-4o (92.8%) was significantly higher than that of ChatGPT-4 (81.7%; p < .001). The question groups affected the accuracy rates of both AIs (p < .001). The times at which the questions were asked did not affect the accuracy of either AI (p > .05). There was no statistically significant difference in the consistency rate between ChatGPT-4 and ChatGPT-4o (p = .123). The question groups did not affect the consistency of either AI, too (p > .05).

CONCLUSIONS: According to the results of this study, the accuracy of ChatGPT-4o was better than that of ChatGPT-4. These findings demonstrate that AI chatbots can be used in dental education. However, it is also necessary to consider the limitations and potential risks associated with AI.

PMID:40025853 | DOI:10.1111/iej.14217

By Nevin Manimala