Categories
Nevin Manimala Statistics

Performance of Large Language Models in Differentiating Systemic Lupus Erythematosus From Mimicking Conditions Using the 2019 EULAR/ACR Criteria: A Comparative Analysis

Int J Rheum Dis. 2026 May;29(5):e70713. doi: 10.1111/1756-185x.70713.

ABSTRACT

INTRODUCTION: Systemic Lupus Erythematosus (SLE) presents a significant diagnostic challenge for clinicians due to its diverse clinical manifestations and overlap with other autoimmune conditions. Large Language Models (LLMs) are currently regarded as having the potential to assist clinicians in expediting decision-making. This study aimed to evaluate the performance of four LLMs in differentiating SLE from clinically mimicking conditions.

METHODS: A retrospective diagnostic accuracy study was conducted involving 100 patients at a rheumatology center: 50 patients with confirmed SLE and 50 non-SLE patients with conditions including rheumatoid arthritis, systemic sclerosis, axial spondyloarthritis, psoriatic arthritis, myositis, ANCA-associated vasculitis, mixed connective tissue disease, undifferentiated connective tissue disease, and fibromyalgia. Four LLMs were evaluated: Deepseek, ChatGPT 4.0, Claude Sonnet 4, and Gemini. The 2019 European Alliance of Associations for Rheumatology/American College of Rheumatology (EULAR/ACR) classification criteria were applied. Diagnostic accuracy, positive predictive value (PPV), negative predictive value (NPV), and Area Under the Receiver Operating Characteristic Curve (AUC) were calculated. IBM SPSS Statistics version 25 was used for all analyses.

RESULTS: Gemini achieved the highest performance score, with an accuracy of 96% (95% CI: 91.2-100.0%), sensitivity of 94% (95% CI: 89.3-98.7%), specificity of 98% (95% CI: 93.1-100.0%), and an AUC of 0.960. ChatGPT 4.0 and Claude Sonnet 4 exhibited comparable accuracy. Deepseek recorded the lowest performance score.

CONCLUSION: Gemini demonstrated significant potential to assist clinicians in differentiating SLE from mimicking conditions. Nevertheless, prospective validation in real-world clinical settings is required before these tools can be reliably integrated into clinical practice.

PMID:42178941 | DOI:10.1111/1756-185x.70713

By Nevin Manimala

Portfolio Website for Nevin Manimala