User Experience and Early Clinical Outcomes of a Mental Wellness Chatbot for Depression and Anxiety: Pilot Evaluation Mixed Methods Study

JMIR Form Res. 2026 Apr 14;10:e90644. doi: 10.2196/90644.

ABSTRACT

BACKGROUND: Artificial intelligence-powered conversational agents (ie, chatbots) are increasingly popular outlets for users seeking psychological support, yet little is known about how users experience early-stage prototypes or which therapeutic processes contribute to clinical improvement. A transparent evaluation of emerging chatbot prototypes is needed to clarify if, how, and why artificial intelligence companions work and to guide their continued development.

OBJECTIVE: This mixed methods pilot study evaluated user experience, acceptability, and preliminary clinical signals for an early-stage mental wellness chatbot. We also examined whether baseline symptom severity moderated clinical improvement.

METHODS: Three sequential cohorts (n=125) completed a 2-week, incentivized chatbot exposure (approximately 60 min per week). Participants provided first-impression ratings, qualitative feedback, and pre-post assessments of depressive symptoms (PHQ-8 [Patient Health Questionnaire-8]), anxiety symptoms (GAD-7 [Generalized Anxiety Disorder-7]), psychological distress, well-being, and loneliness. Statistical models estimated symptom change and tested interactions with baseline symptom severity. Mixed methods analysis integrated quantitative outcomes with large language model-assisted qualitative content analysis of open-ended responses.

RESULTS: Participants described the chatbot as accessible, easy to use, and emotionally validating, while citing limitations in personalization and conversational depth. Qualitative responses consistently highlighted early therapeutic processes such as emotional validation, goal setting, and perceived attunement. Regression models showed significant pre-post reductions in depressive (Hedges g=-0.32) and anxiety (g=-0.32) symptoms, alongside modest improvements in distress and well-being. Baseline severity moderated improvement, with marginal effects indicating larger predicted reductions at higher PHQ-8 and GAD-7 baseline scores (eg, PHQ-8=15: g=-0.84; GAD-7=15: g=-0.62).

CONCLUSIONS: This pilot provides a comprehensive view of early chatbot development and suggests promising user experiences and preliminary symptom improvements under structured pilot conditions. By integrating experiential and exploratory clinical data, the study identifies candidate process targets to inform ongoing refinement. Findings support continued development and demonstrate procedural feasibility for progression to larger, longer-term trials evaluating engagement and clinical outcomes under more naturalistic conditions.

PMID:41980262 | DOI:10.2196/90644

By Nevin Manimala