Categories
Nevin Manimala Statistics

Assessing the Adherence of ChatGPT Chatbots to Public Health Guidelines for Smoking Cessation: Content Analysis

J Med Internet Res. 2025 Jan 30;27:e66896. doi: 10.2196/66896.

ABSTRACT

BACKGROUND: Large language model (LLM) artificial intelligence chatbots using generative language can offer smoking cessation information and advice. However, little is known about the reliability of the information provided to users.

OBJECTIVE: This study aims to examine whether 3 ChatGPT chatbots-the World Health Organization’s Sarah, BeFreeGPT, and BasicGPT-provide reliable information on how to quit smoking.

METHODS: A list of quit smoking queries was generated from frequent quit smoking searches on Google related to “how to quit smoking” (n=12). Each query was given to each chatbot, and responses were analyzed for their adherence to an index developed from the US Preventive Services Task Force public health guidelines for quitting smoking and counseling principles. Responses were independently coded by 2 reviewers, and differences were resolved by a third coder.

RESULTS: Across chatbots and queries, on average, chatbot responses were rated as being adherent to 57.1% of the items on the adherence index. Sarah’s adherence (72.2%) was significantly higher than BeFreeGPT (50%) and BasicGPT (47.8%; P<.001). The majority of chatbot responses had clear language (97.3%) and included a recommendation to seek out professional counseling (80.3%). About half of the responses included the recommendation to consider using nicotine replacement therapy (52.7%), the recommendation to seek out social support from friends and family (55.6%), and information on how to deal with cravings when quitting smoking (44.4%). The least common was information about considering the use of non-nicotine replacement therapy prescription drugs (14.1%). Finally, some types of misinformation were present in 22% of responses. Specific queries that were most challenging for the chatbots included queries on “how to quit smoking cold turkey,” “…with vapes,” “…with gummies,” “…with a necklace,” and “…with hypnosis.” All chatbots showed resilience to adversarial attacks that were intended to derail the conversation.

CONCLUSIONS: LLM chatbots varied in their adherence to quit-smoking guidelines and counseling principles. While chatbots reliably provided some types of information, they omitted other types, as well as occasionally provided misinformation, especially for queries about less evidence-based methods of quitting. LLM chatbot instructions can be revised to compensate for these weaknesses.

PMID:39883917 | DOI:10.2196/66896

By Nevin Manimala

Portfolio Website for Nevin Manimala