Categories
Nevin Manimala Statistics

Comparative Evaluation of ChatGPT-4o and Grok-3 on Cleft Lip and Palate and Presurgical Infant Orthopedics: A Multidisciplinary Assessment by Orthodontists, Pediatricians, and Plastic Surgeons

Cleft Palate Craniofac J. 2025 Sep 16:10556656251378591. doi: 10.1177/10556656251378591. Online ahead of print.

ABSTRACT

Objective: This study aimed to evaluate and compare the accuracy, clarity, and clinical applicability of 2 state-of-the-art large language models (LLMs), Chat Generative Pretrained Transformer (ChatGPT)-4o and Grok-3, in generating health information related to cleft lip and palate (CLP) and presurgical infant orthopedics (PSIO). To ensure a multidisciplinary perspective, experts from orthodontics, pediatrics, and plastic surgery independently evaluated the responses. Methods: Six structured questions addressing general and presurgical aspects of CLP were submitted to both ChatGPT-4o and Grok-3. Forty-five blinded specialists (15 from each specialty) assessed the 12 generated responses using 2 validated instruments: the DISCERN tool and the Global Quality Scale (GQS). We conducted interspecialty comparisons to explore variations in model evaluation. Results: We observed no statistically significant differences between ChatGPT-4o and Grok-3 in DISCERN or GQS scores (P > .05). However, pediatricians consistently assigned higher ratings than orthodontists and plastic surgeons in terms of reliability, clarity, and treatment-related content. Patient-directed questions received higher overall scores than those aimed at healthcare professionals. Grok-3 performed slightly better on questions about PSIO, whereas ChatGPT-4o provided more comprehensive and structured answers. Conclusion: Both LLMs demonstrated notable potential in producing readable, informative responses about CLP and PSIO. While they may aid in patient communication and support clinical education, professional oversight remains critical to ensure medical accuracy. The inclusion of Grok-3 in this orthodontic evaluation provides valuable insights and sets the stage for future research on artificial intelligence integration in interdisciplinary cleft care.

PMID:40956923 | DOI:10.1177/10556656251378591

By Nevin Manimala

Portfolio Website for Nevin Manimala