J Craniofac Surg. 2025 Apr 17. doi: 10.1097/SCS.0000000000011399. Online ahead of print.
ABSTRACT
Advancements in natural language processing (NLP) have led to the emergence of large language models (LLMs) as potential tools for patient consultations. This study investigates the ability of reasoning-capable models to provide diagnostic and treatment recommendations for orofacial clefts. A cross-sectional comparative study was conducted using 20 questions based on Google Trends and expert experience, with both models providing responses to these queries. Readability was assessed using the Flesch-Kincaid Reading Ease (FRES), Flesch-Kincaid Grade Level (FKGL), sentence count, number of sentences, and percentage of complex words. No statistically significant differences were found in the readability metrics for FKGL (P = 0.064) and FRES (P = 0.56) between the responses of the 2 models. Physician evaluation using a 4-point Likert scale assessed accuracy, clarity, relevance, and trustworthiness, with Deepseek-R1 achieving significantly higher ratings overall (P = 0.041). However, GPT o1-preview exhibited notable empathy in certain clinical scenarios. Both models displayed complementary strengths, indicating potential for clinical consultation applications. Future research should focus on integrating these strengths within medical-specific LLMs to generate more reliable, empathetic, and personalized treatment recommendations.
PMID:40245329 | DOI:10.1097/SCS.0000000000011399