Exploring artificial intelligence chatbots in pediatric fluoride education: a cross-sectional study

Sci Rep. 2025 Nov 29. doi: 10.1038/s41598-025-28857-y. Online ahead of print.

ABSTRACT

Large language model-based (LLM) chatbots are increasingly integrated into healthcare communication, offering accessible and interactive information. These artificial intelligence (AI) tools have the potential to influence caregiver health behaviors when tailored to user needs and literacy levels. In pediatric dentistry, fluoride remains a cornerstone of caries prevention but is also subject to public concerns and online misinformation, underscoring the need for reliable digital communication. This observational and exploratory study evaluated the performance of three advanced AI chatbots-ChatGPT-4.o, Google Gemini Pro, and DeepSeek V3-in providing fluoride-related information to parents and caregivers in the context of pediatric oral health. Twenty fluoride-related questions, derived from American Academy of Pediatric Dentistry (AAPD) guideline themes, were presented to each chatbot in standardized sessions. Responses were independently evaluated by three blinded reviewers using validated tools: EQIP, DISCERN, Global Quality Scale (GQS), Flesch Reading Ease Score (FRES), Flesch-Kincaid Reading Grade Level (FKRGL), and iThenticate similarity index. These instruments assessed quality, reliability, readability, and originality. Inter-rater reliability was confirmed with intraclass correlation coefficients (ICCs). Statistical analyses were conducted using ANOVA or Kruskal-Wallis tests with appropriate post-hoc methods. ChatGPT-4.o achieved significantly higher EQIP (M = 4.32, SD = 0.43) and DISCERN (M = 4.20, SD = 0.48) scores than Gemini Pro and DeepSeek V3 (p < 0.001), indicating superior reliability and informational quality. While FRES (median = 68.5, p = 0.12) and Similarity Index (≤ 10%, p = 0.54) showed no significant differences, ChatGPT consistently produced more readable and original content. FKRGL differences were borderline (p = 0.041) but not retained after correction, and GQS outcomes were comparable. These findings suggest that ChatGPT’s superior performance is not only statistically significant but also practically relevant for enhancing parental comprehension of fluoride use. Among the evaluated models, ChatGPT-4.o demonstrated the clearest and most reliable fluoride communication. Its higher EQIP and DISCERN scores highlight its potential as a supportive tool for caregiver education in pediatric dentistry. Nonetheless, these systems should be implemented cautiously, complemented with professional oversight, and continuously validated to prevent misinformation and ensure safe clinical integration.

PMID:41318741 | DOI:10.1038/s41598-025-28857-y

By Nevin Manimala