Artif Intell Med. 2025 Nov 13;171:103305. doi: 10.1016/j.artmed.2025.103305. Online ahead of print.
ABSTRACT
Depression is a significant global health issue with increasing prevalence. Current diagnostic methods rely on subjective observations and questionnaires, often resulting in underestimation of the condition and insufficient treatment. This study investigates voice-based markers for detecting depressive symptoms through a novel system of virtual humans (VHs) capable of engaging in open-ended talks, unlike previous research which relied primarily on structured clinical interview formats. A total of 101 participants (42 with depressive symptoms) engaged in six casual social interactions with VHs simulating basic emotions, forming the DEPTALK dataset. Speech recordings and their automatic transcriptions were processed using state-of-the-art pre-trained transformer-based models to generate embeddings. We first employed a conversation-level aggregation strategy, combining embeddings across each dialogue and classifying them with Extreme Gradient Boosting. A single model trained on all six conversations per participant outperformed emotion-specific models, achieving F1 scores of 0.566 for speech, 0.329 for text, and 0.648 for the multimodal fusion, indicating that aggregating emotionally diverse interactions exposes stronger depression cues. To capture temporal dynamics, we further implemented a turn-level aggregation strategy using Gated Recurrent Units and training on all conversations. This approach improved performance for text (F1 = 0.505) and maintained competitive results for speech (F1 = 0.541), although the multimodal GRU model (F1 = 0.556) did not surpass the best conversation-level model. Overall, findings suggest that in casual conversations, depressive symptoms are primarily conveyed through prosody, with the addition of semantic context further enhancing detection. This study advances the understanding of speech-based depression patterns in simulated social interactions and highlights the potential of using VHs for more objective depressive symptoms detection.
PMID:41241976 | DOI:10.1016/j.artmed.2025.103305