Comparative Analysis of Generative AI Language Models in Orthodontics: Evidence-Based Insights Into Perplexity, iASK, and ChatGPT 4o Mini

ScientificWorldJournal. 2026 Mar 3;2026:5479774. doi: 10.1155/tswj/5479774. eCollection 2026.

ABSTRACT

OBJECTIVE: This study is aimed at evaluating and comparing the scientific reliability of three large language models (LLMs), Perplexity, iASK, and ChatGPT 4o mini, based on their responses to orthodontic-related queries.

MATERIALS AND METHODS: The three LLMs were prompted with 10 clinical orthodontic questions, and their responses were assessed independently by two evaluators using a structured scoring system (0-10). Statistical analyses, including Pearson and Spearman correlations, Cronbach’s alpha, and Wilcoxon signed-rank test, were performed to determine interevaluator reliability and model performance differences.

RESULTS: Perplexity achieved the highest mean score (7.2), followed by iASK (5.4) and ChatGPT 4o mini (5.2). High consistency between evaluators was observed (Cronbach^‘s alpha = 0.947). A significant difference was noted between Perplexity and both ChatGPT 4o mini and iASK (p = 0.002). Pearson and Spearman correlations indicated strong agreement between evaluators (r = 0.982, ρ = 1.000).

CONCLUSION: Perplexity demonstrated superior performance in orthodontic-related queries compared to ChatGPT 4o mini and iASK. The findings highlight the importance of evaluating AI models for clinical applicability and reliability.

PMID:41789391 | PMC:PMC12957766 | DOI:10.1155/tswj/5479774

By Nevin Manimala