Categories
Nevin Manimala Statistics

Performance Comparison of a Domain-Specific Chatbot and General-Purpose Chatbots in Dental Traumatology

Dent Traumatol. 2025 Nov 28. doi: 10.1111/edt.70039. Online ahead of print.

ABSTRACT

BACKGROUND: Use of artificial intelligence chatbots in dental traumatology has increased. However, concerns regarding their reliability are yet to be addressed. This study aims to evaluate the accuracy of a new AI chatbot Dental Trauma Evo in responding to queries on dental fractures and luxations.

MATERIALS AND METHODS: A total of 45 questions, including multiple-choice questions (MCQs), true/false, and yes/no types of questions were created and validated in accordance with the International Association of Dental Traumatology’s position statement on fractures and luxations. Over the course of nine consecutive days in incognito mode, they were simultaneously exposed to four chatbots: ChatGPT-4o, DeepSeek R1, Google Gemini 2.5, and Dental Trauma Evo. The obtained answers were verified for accuracy and consistency. The Fisher’s exact test was used for statistical analysis.

RESULTS: The best overall accuracy was shown by Dental Trauma Evo (85.43%), which was followed by Google Gemini (81.72%), DeepSeek (80.24%), and ChatGPT-4o (79.75%). Regarding question type, ChatGPT-4o, Google Gemini, and Dental Trauma Evo recorded the best responses to Yes/No, True/False, and MCQs respectively. There was an insignificant difference between the question type and the different chatbots (p > 0.05).

CONCLUSION: The Dental Trauma Evo chatbot developed using the International Association of Dental Traumatology guidelines exhibited a favorable preliminary performance in the current study. Further research, clinical validation, and model enhancements are necessary to ensure the effective implementation of the same in practice.

PMID:41312575 | DOI:10.1111/edt.70039

By Nevin Manimala

Portfolio Website for Nevin Manimala