Comparative performance evaluation of ChatGPT-4 Omni and Gemini Advanced in the Turkish Dentistry Specialization Exam

BMC Med Educ. 2026 Jan 17. doi: 10.1186/s12909-026-08621-0. Online ahead of print.

ABSTRACT

BACKGROUND: In recent years, advancements in artificial intelligence (AI) have led to the widespread integration of large language models and their chatbot applications into various fields, including dental education. This study aimed to evaluate the accuracy of ChatGPT-4 Omni (ChatGPT-4o) and Gemini Advanced in answering multiple-choice questions from the Turkish Dentistry Specialization Exams (DUS) across various disciplines.

METHODS: A total of 1,504 multiple-choice questions from 10 years of DUS exams were analyzed to compare the accuracy of ChatGPT-4o and Gemini Advanced. The questions were categorized into Fundamental Medical Sciences (n = 514) and Clinical Dental Sciences (n = 990). Each question was submitted to both chatbots, resulting in 3,008 responses. Accuracy was assessed using the official answer keys. Chi-square tests and Bonferroni post-hoc analyses were used to compare accuracy across disciplines and examine year-based variations.

RESULTS: ChatGPT-4o achieved an overall accuracy rate of 84%, while Gemini Advanced achieved 81.8% (p = 0.110). For the Fundamental Medical Sciences questions, no statistically significant differences were observed across sub-disciplines, with overall accuracies of 92.6% for ChatGPT-4o and 93.4% for Gemini Advanced. For the Clinical Dental Sciences questions, ChatGPT-4o outperformed Gemini Advanced in Prosthetic Dentistry (p = 0.013) and Dentomaxillofacial Radiology (p = 0.001), whereas Gemini Advanced showed higher accuracy in Pediatric Dentistry (p = 0.008). Across all Clinical Dental Sciences questions, ChatGPT-4o achieved an accuracy of 79.5%, compared to 75.8% for Gemini Advanced, and this difference was statistically significant (p = 0.046).

CONCLUSIONS: AI-based chatbots demonstrate strong potential in answering multiple-choice dentistry questions. However, variations in performance across disciplines were observed, indicating differences in accuracy depending on the subject area. These findings highlight the potential educational implications of integrating AI into dental curricula, particularly as supplementary tools for exam preparation and knowledge reinforcement. Nevertheless, cautious integration is required to ensure that AI supports, rather than replaces, critical thinking and professional expertise.

PMID:41547810 | DOI:10.1186/s12909-026-08621-0

By Nevin Manimala