Br J Ophthalmol. 2024 Jul 17:bjo-2023-324526. doi: 10.1136/bjo-2023-324526. Online ahead of print.
ABSTRACT
BACKGROUND: Large language models (LLMs), such as ChatGPT, have considerable implications for various medical applications. However, ChatGPT’s training primarily draws from English-centric internet data and is not tailored explicitly to the medical domain. Thus, an ophthalmic LLM in Chinese is clinically essential for both healthcare providers and patients in mainland China.
METHODS: We developed an LLM of ophthalmology (MOPH) using Chinese corpora and evaluated its performance in three clinical scenarios: ophthalmic board exams in Chinese, answering evidence-based medicine-oriented ophthalmic questions and diagnostic accuracy for clinical vignettes. Additionally, we compared MOPH’s performance to that of human doctors.
RESULTS: In the ophthalmic exam, MOPH’s average score closely aligned with the mean score of trainees (64.7 (range 62-68) vs 66.2 (range 50-92), p=0.817), but achieving a score above 60 in all seven mock exams. In answering ophthalmic questions, MOPH demonstrated an adherence of 83.3% (25/30) of responses following Chinese guidelines (Likert scale 4-5). Only 6.7% (2/30, Likert scale 1-2) and 10% (3/30, Likert scale 3) of responses were rated as ‘poor or very poor’ or ‘potentially misinterpretable inaccuracies’ by reviewers. In diagnostic accuracy, although the rate of correct diagnosis by ophthalmologists was superior to that by MOPH (96.1% vs 81.1%, p>0.05), the difference was not statistically significant.
CONCLUSION: This study demonstrated the promising performance of MOPH, a Chinese-specific ophthalmic LLM, in diverse clinical scenarios. MOPH has potential real-world applications in Chinese-language ophthalmology settings.
PMID:39019566 | DOI:10.1136/bjo-2023-324526