ChatGPT versus UpToDate in Preclinical Medical Education: Cross-Sectional Analysis Using Term Frequency-Inverse Document Frequency Cosine Similarity

JMIR Med Educ. 2026 Mar 20;12:e82885. doi: 10.2196/82885.

ABSTRACT

BACKGROUND: Generative artificial intelligence tools such as ChatGPT are increasingly used by medical students for self-directed learning. Although these models demonstrate linguistic fluency, their reliability as supplementary resources for preclinical education remains uncertain. In particular, comparisons with evidence-based references such as UpToDate are lacking.

OBJECTIVE: This study evaluated the similarity between responses generated by ChatGPT (with GPT-4o mini) and those from UpToDate to preclinical medical education questions to assess ChatGPT’s potential as an adjunctive learning tool.

METHODS: We conducted a cross-sectional comparison study using 150 first-order questions derived from a preclinical question bank at a single allopathic institution under the oversight of a medical educator with more than 25 years of teaching experience. Each question was entered into ChatGPT 10 times in separate chat sessions, and responses from UpToDate were retrieved from the most relevant articles. The responses were preprocessed through lemmatization, stop-word removal, punctuation removal, and numeric normalization. Similarity between ChatGPT and UpToDate responses was quantified using term frequency-inverse document frequency (TF-IDF) cosine similarity. To determine whether the observed similarities exceeded chance, ChatGPT outputs were compared with a null distribution generated from randomized text.

RESULTS: ChatGPT responses demonstrated statistically significant similarity to UpToDate in 59.3% (89/150) of questions. Across subject areas, pharmacology showed the highest concordance (mean cosine similarity 0.338, SD 0.134), followed by pathology (mean 0.321, SD 0.142), biochemistry (mean 0.296, SD 0.120), microbiology (mean 0.297, SD 0.108), and immunology (mean 0.275, SD 0.102). All subject-level similarity scores exceeded those generated from randomized text, confirming that the observed overlap was nonrandom.

CONCLUSIONS: ChatGPT with GPT-4o mini exhibited moderate but meaningful alignment with UpToDate across preclinical topics, performing best in fact-based disciplines such as pharmacology. Although it is not a substitute for evidence-based resources, ChatGPT may serve as an accessible adjunctive tool for medical students. Integration into preclinical learning should be coupled with artificial intelligence literacy training to promote responsible use and critical appraisal.

PMID:41861392 | DOI:10.2196/82885

By Nevin Manimala