Evaluating Generative AI Psychotherapy Chatbots Used by Youth: Cross-Sectional Study

JMIR Ment Health. 2025 Dec 10;12:e79838. doi: 10.2196/79838.

ABSTRACT

BACKGROUND: Many youth rely on direct-to-consumer generative artificial intelligence (GenAI) chatbots for mental health support, yet the quality of the psychotherapeutic capabilities of these chatbots is understudied.

OBJECTIVE: This study aimed to comprehensively evaluate and compare the quality of widely used GenAI chatbots with psychotherapeutic capabilities using the Conversational Agent for Psychotherapy Evaluation II (CAPE-II) framework.

METHODS: In this cross-sectional study, trained raters used the CAPE-II framework to rate the quality of 5 chatbots from GenAI platforms widely used by youth. Trained raters role-played as youth using personas of youth with mental health challenges to prompt chatbots, facilitating conversations. Chatbot responses were generated from August to October 2024. The primary outcomes were rated scores in 9 sections. The proportion of high-quality ratings (binary rating of 1) across each section was compared between chatbots using Bonferroni-corrected chi-square tests.

RESULTS: While GenAI chatbots were found to be accessible (104/120 high-quality ratings, 86.7%) and avoid harmful statements and misinformation (71/80, 89%), they performed poorly in their therapeutic approach (14/45, 31%) and their ability to monitor and assess risk (31/80, 39%). Privacy policies were difficult to understand, and information on chatbot model training and knowledge was unavailable, resulting in low scores. Bonferroni-corrected chi-square tests showed statistically significant differences in chatbot quality in the background, therapeutic approach, and monitoring and risk evaluation sections. Qualitatively, raters perceived most chatbots as having strong conversational abilities but found them plagued by various issues, including fabricated content and poor handling of crisis situations.

CONCLUSIONS: Direct-to-consumer GenAI chatbots are unsafe for the millions of youth who use them. While they demonstrate strengths in accessibility and conversational capabilities, they pose unacceptable risks through improper crisis handling and a lack of transparency regarding privacy and model training. Immediate reforms, including the use of standardized audits of quality, such as the CAPE-II framework, are needed. These findings provide actionable targets for platforms, regulators, and policymakers to protect youth seeking mental health support.

PMID:41370787 | DOI:10.2196/79838

By Nevin Manimala