Categories
Nevin Manimala Statistics

Assessment of ChatGPT-3.5 performance on the medical genetics specialist exam

Lab Med. 2025 Jul 12:lmaf038. doi: 10.1093/labmed/lmaf038. Online ahead of print.

ABSTRACT

INTRODUCTION: Artificial intelligence is increasingly used in medical education and testing. ChatGPT, developed by OpenAI, has shown mixed results on various medical exams, but its performance in medical laboratory genetics remains unknown.

METHODS: This study assessed ChatGPT-3.5 using 456 publicly available questions from the Polish national specialist exam in medical laboratory genetics. Questions were categorized by topic and complexity (simple vs complex) and submitted to ChatGPT 3 times. Accuracy and consistency were statistically evaluated.

RESULTS: ChatGPT correctly answered 59% of the 456 exam questions, which was statistically significant (P < .001). Accuracy differed by category: 71% for calculation-based questions, approximately 60% for genetic methods and genetic alterations, and only 37% for clinical case-based questions. Question complexity also affected performance: Simple questions had 63% accuracy, while complex questions yielded 43% (P = .001). No statistically significant differences were found across 3 repeated sessions, with performance remaining stable over time (P = .43).

DISCUSSION: ChatGPT-3.5 demonstrated moderate accuracy and stable performance on a specialist exam in medical genetics. Although it may support education in this field, the tool’s limitations in complex, domain-specific reasoning suggest the need for further development before broader implementation.

PMID:40654165 | DOI:10.1093/labmed/lmaf038

By Nevin Manimala

Portfolio Website for Nevin Manimala