JMIR Med Inform. 2023 Nov 28;11:e48933. doi: 10.2196/48933.
BACKGROUND: This research integrates a comparative analysis of the performance of human researchers and OpenAI’s ChatGPT in systematic review tasks and describes an assessment of the application of natural language processing (NLP) models in clinical practice through a review of 5 studies.
OBJECTIVE: This study aimed to evaluate the reliability between ChatGPT and human researchers in extracting key information from clinical articles, and to investigate the practical use of NLP in clinical settings as evidenced by selected studies.
METHODS: The study design comprised a systematic review of clinical articles executed independently by human researchers and ChatGPT. The level of agreement between and within raters for parameter extraction was assessed using the Fleiss and Cohen κ statistics.
RESULTS: The comparative analysis revealed a high degree of concordance between ChatGPT and human researchers for most parameters, with less agreement for study design, clinical task, and clinical implementation. The review identified 5 significant studies that demonstrated the diverse applications of NLP in clinical settings. These studies’ findings highlight the potential of NLP to improve clinical efficiency and patient outcomes in various contexts, from enhancing allergy detection and classification to improving quality metrics in psychotherapy treatments for veterans with posttraumatic stress disorder.
CONCLUSIONS: Our findings underscore the potential of NLP models, including ChatGPT, in performing systematic reviews and other clinical tasks. Despite certain limitations, NLP models present a promising avenue for enhancing health care efficiency and accuracy. Future studies must focus on broadening the range of clinical applications and exploring the ethical considerations of implementing NLP applications in health care settings.