Categories
Nevin Manimala Statistics

Evaluation of the Quality and Reliability of ChatGPT-4’s Responses on Allergen Immunotherapy Using Validated Instruments for Health Information Quality Assessment

Clin Transl Allergy. 2025 Dec;15(12):e70130. doi: 10.1002/clt2.70130.

ABSTRACT

BACKGROUND: Chat Generative Pre-Trained Transformer 4 (ChatGPT-4) represents an advancing large language model (LLM) with potential applications in medical education and patient care. While Allergen Immunotherapy (AIT) can change the course of allergic diseases, it can also bring uncertainty to patients, who turn to readily available resources such as ChatGPT-4 to address these doubts. This study aimed to use validated tools to evaluate the information provided by ChatGPT-4 regarding AIT in terms of quality, reliability, and readability.

METHODS: In accordance with EAACI clinical guidelines about AIT, 24 questions were selected and introduced in ChatGPT-4. Independent reviewers evaluated ChatGPT-4 responses using three validated tools: the DISCERN instrument (quality), JAMA Benchmark criteria (reliability), and Flesch-Kincaid Readability Tests (readability). Descriptive statistics summarized findings across categories.

RESULTS: ChatGPT-4 responses were generally rated as “fair quality” on DISCERN, with strengths in classification/formulations and special populations. Notably, the tool provided good-quality responses on the preventive effects of AIT in children and premedication to reduce adverse reactions. However, JAMA Benchmark scores consistently indicated “insufficient information” (median = 0-1), primarily due to absent authorship, attribution, disclosure, and currency. Readability analyses revealed a college graduate-level requirement, with most responses classified as “very difficult” to understand. Overall, ChatGPT-4 demonstrated fair quality, insufficient reliability, and difficult readability for patients.

CONCLUSIONS: ChatGPT-4 provides generally well-structured responses on AIT but lacks reliability and readability for clinical or patient-directed use. Until specialized, reference-based models are developed, healthcare professionals should supervise its use, particularly in sensitive areas such as dosing and safety.

PMID:41319041 | DOI:10.1002/clt2.70130

By Nevin Manimala

Portfolio Website for Nevin Manimala