Scalable depression monitoring with smartphone speech using a multimodal benchmark and topic analysis

NPJ Digit Med. 2026 Feb 28. doi: 10.1038/s41746-026-02486-9. Online ahead of print.

ABSTRACT

Objective, scalable biomarkers are needed for continuous monitoring of major depressive disorder. Smartphone-collected speech is promising, yet clinically useful signals remain elusive. We analyzed 3151 weekly voice diaries from 284 German-speaking adults (128 MDD, 156 controls) to predict Beck Depression Inventory (BDI) scores. Sentence-embedding models outperformed lexical and acoustic baselines: Qwen3-8B achieved MAE 4.65 and R² 0.34, and stacked generalization of multilingual-E5 with Qwen3-8B further improved performance (MAE 4.37, R² 0.41). Audio embeddings added little incremental value. In an MDD-only analysis, multilingual-E5 was the top single modality (MAE 6.74, R² 0.20). To aid interpretation, BERTopic uncovered six coherent themes; BDI scores were highest for “Distress & care”, supporting clinical face validity. Together, LLM embeddings paired with lightweight topic analysis capture the dominant signal of depression severity in everyday speech and offer a scalable route to ecologically valid digital phenotyping.

PMID:41764298 | DOI:10.1038/s41746-026-02486-9

By Nevin Manimala