Systematic review of machine learning and deep learning models for EEG-based detection of depression

J Psychiatr Res. 2026 Apr 24;199:113-121. doi: 10.1016/j.jpsychires.2026.04.030. Online ahead of print.

ABSTRACT

OBJECTIVE: Depression is a leading cause of global disability, motivating the development of objective and scalable diagnostic approaches. Quantitative electroencephalography (QEEG) combined with machine learning (ML) and deep learning (DL) techniques has gained increasing attention for depression detection. This systematic review aimed to critically examine and descriptively compare the methodologies, performance metrics, and limitations of ML- and DL-based models applied to EEG data for depression detection.

METHODS: A systematic review was conducted in accordance with PRISMA 2020 guidelines. Seven electronic databases (PubMed, Scopus, IEEE Xplore, ScienceDirect, Web of Science, SAGE Journals, and MDPI) were searched for peer-reviewed studies published between 2020 and 2024. Eligible studies included human participants, used EEG signals for depression detection, and applied ML or DL algorithms. Extracted information comprised algorithm type, sample size, EEG acquisition parameters, validation strategies, and reported performance metrics, which were synthesized descriptively across studies. Risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool.

RESULTS: A total of 42 studies met the inclusion criteria, including 23 ML-based and 19 DL-based investigations. Reported classification accuracy ranged from approximately 76% to 100%. DL studies showed a higher mean reported accuracy than ML studies (93.92% vs. 90.78%); however, this difference was not statistically significant in the exploratory non-parametric comparison. Near-perfect performance values were frequently observed in studies with small sample sizes and subject-dependent or exclusively internal validation strategies, raising concerns regarding overfitting and limited generalizability. Studies relying on publicly available datasets tended to report more stable performance. QUADAS-2 assessment revealed recurrent risk-of-bias concerns, particularly in the domains of patient selection and index test conduct.

CONCLUSIONS: Both ML and DL approaches demonstrate potential for EEG-based depression detection, but reported performance differences between them should be interpreted cautiously. Although DL studies tended to report higher accuracy values, this pattern was not statistically significant in exploratory analyses and was strongly influenced by sample size, validation strategy, and methodological design. Future research should prioritize larger and more diverse samples, subject-independent or external validation strategies, and standardized reporting frameworks to enhance methodological rigor and clinical applicability.

PMID:42056808 | DOI:10.1016/j.jpsychires.2026.04.030

By Nevin Manimala