Categories
Nevin Manimala Statistics

Examining Artificial Intelligence Chatbots’ Responses in Providing Human Papillomavirus Vaccine Information for Young Adults: Qualitative Content Analysis

JMIR Public Health Surveill. 2026 Feb 18;12:e79720. doi: 10.2196/79720.

ABSTRACT

BACKGROUND: The growing use of artificial intelligence (AI) chatbots for seeking health-related information is concerning, as they were not originally developed for delivering medical guidance. The quality of AI chatbots’ responses relies heavily on their training data and is often limited in medical contexts due to their lack of specific training data in medical literature. Findings on the quality of AI chatbot responses related to health are mixed. Some studies showed the quality surpassed physicians’ responses, while others revealed occasional major errors and low readability. This study addresses a critical gap by examining the performance of various AI chatbots in a complex, misinformation-rich environment.

OBJECTIVE: This study examined AI chatbots’ responses to human papillomavirus (HPV)-related questions by analyzing structure, linguistic features, information accuracy and currency, and vaccination stance.

METHODS: We conducted a qualitative content analysis following the approach outlined by Schreier to examine 4 selected AI chatbots’ (ChatGPT 4, Claude 3.7 Sonnet, DeepSeek V3, and Docus [General AI Doctor]) responses to HPV vaccine questions. These questions, simulated by young adults, were adapted from items on the Vaccine Conspiracy Beliefs Scale and Google Trends. The selection criteria for AI chatbots included popularity, accessibility, countries of origin, response update methods, and intended use. Two researchers, simulating a 22-year-old man or woman, collected 8 conversations between February 22 and 28, 2025. We used a deductive approach to develop initial code groups, then an inductive approach to generate codes. The responses were analyzed based on a comprehensive codebook, with codes examining response structure, linguistic features, information accuracy and currency, and vaccination stance. We also assessed readability using the Flesch-Kincaid Grade Level and Reading Ease Score.

RESULTS: All AI chatbots cited evidence-based sources from reputable health organizations. We found no fabricated information or inaccuracies in numerical data. For complex questions, all AI chatbots appropriately deferred to health care professionals’ suggestions. All AI chatbots maintained a neutral or provaccine stance, corresponding with scientific consensus. The mean and range of response lengths varied [word count; ChatGPT: 436.4 (218-954); Claude: 188.0 (138-255); DeepSeek: 510.0 (325-735); and Docus: 159.4 (61-200)], as did readability [Flesch-Kincaid Grade Level; ChatGPT: 10.7 (6.0-14.9); Claude: 13.2 (7.7-17.8); DeepSeek: 11.3 (7.0-14.7); and Docus: 12.2 (8.9-15.5); and Flesch-Kincaid Reading Ease Score; ChatGPT: 46.8 (25.4-72.2); Claude: 32.5 (6.3-67.3); DeepSeek: 43.7 (22.8-67.4); and Docus: 40.5 (19.6-58.2)]. ChatGPT and Claude offered personalized responses, while DeepSeek and Docus lacked this. Occasionally, some responses included broken or irrelevant links and medical jargon.

CONCLUSIONS: Amidst an online environment saturated with misinformation, AI chatbots have the potential to serve as an alternative source of accurate HPV-related information to conventional online platforms (websites and social media). Improvements in readability, personalization, and link accuracy are still needed. Furthermore, we recommend that users treat AI chatbots as complements, not replacements, to health care professionals’ guidance on clinical settings.

PMID:41707197 | DOI:10.2196/79720

By Nevin Manimala

Portfolio Website for Nevin Manimala