Med Ultrason. 2025 Apr 29. doi: 10.11152/mu-4505. Online ahead of print.
ABSTRACT
AIM: To evaluate the effectiveness of two large language models, ChatGPT-4 and Claude 3, in improving the accuracy of question responses by senior sonologist and junior sonologist.
MATERIAL AND METHODS: A senior and a junior sonologist were given a practice exam. After answering the questions, they reviewed the responses and explanations provided by ChatGPT-4 and Claude 3. The accuracy and scores before and after incorporating the models’ input were analyzed to compare their effectiveness.
RESULTS: No statistically significant differences were found between the two models’ responses scores for each section (all p>0.05). For junior sonologist, both ChatGPT-4 (p=0.039) and Claude 3 (p=0.039) significantly improved scores in basic knowledge. The responses provided by ChatGPT-4 also significantly improved scores in relevant professional knowledge (p=0.038), though their explanations did not (p=0.077). For all exam sections, both models’ responses and explanations significantly improved scores (all p<0.05). For senior sonologist, both ChatGPT-4’s responses (p=0.022) and explanations (p=0.034) improved scores in basic knowledge, as did Claude 3’s explanations (p=0.003). Across all sections, Claude 3’s explanations significantly improved scores (p=0.041).
CONCLUSION: ChatGPT-4 and Claude 3 significantly improved sonologist’ examination performance, particularly in basic knowledge.
PMID:40349377 | DOI:10.11152/mu-4505