Categories
Nevin Manimala Statistics

Identifying postpartum depression subtypes using natural language processing and clinical notes

BMJ Ment Health. 2026 Mar 23;29(1):e302066. doi: 10.1136/bmjment-2025-302066.

ABSTRACT

BACKGROUND: Postpartum depression (PPD) remains vastly underdiagnosed, and its clinical heterogeneity is not well understood. Diagnosis codes in electronic health records (EHRs) alone may not identify all PPD cases, highlighting a need for novel detection approaches.

OBJECTIVE: To develop a transformer-based natural language processing (NLP) method to identify patients with PPD from clinical notes in EHRs and to examine demographic and clinical heterogeneity among identified cases.

METHODS: Clinical notes from 64 426 patients who gave birth between 2010 and 2023 at a major US academic medical centre were used to develop and evaluate the NLP method. By augmenting the NLP output with International Classification of Diseases (ICD-9/10) diagnosis codes, three subgroups of individuals with PPD were identified: patients identified by ICD only (PPD-ICD), NLP only (PPD-NLP) and both ICD and NLP (PPD-BOTH). Demographics, mental health and substance use disorders (SUDs), antidepressant treatment, behavioural therapy and healthcare utilisation were compared across PPD subgroups and a non-PPD control group. Longitudinal associations of depression and anxiety were also examined.

FINDINGS: The NLP method identified an additional 29.6% of patients whose clinical notes indicated symptoms suggestive of PPD but who lacked an ICD diagnosis. Significant variation was observed among PPD subgroups in comorbid psychiatric disorders, SUDs, treatment patterns and healthcare utilisation. During the 24 months post-delivery, the PPD-BOTH subgroup exhibited the highest rates of anxiety disorder diagnoses (vs PPD-ICD: OR 1.69, 95% CI 1.49 to 1.93; vs PPD-NLP: OR 4.46, 95% CI 3.82 to 5.22), antidepressant prescriptions (vs PPD-ICD: OR 1.95, 95% CI 1.71 to 2.22; vs PPD-NLP: OR 5.98, 95% CI 5.11 to 7.01) and mental health outpatient visits (vs PPD-ICD: OR 1.45, 95% CI 1.24 to 1.7; vs PPD-NLP: OR 4.94, 95% CI 3.9 to 6.31), suggesting higher symptom severity (all p<0.001). Comorbid depression and anxiety diagnoses were most prevalent during the postpartum period and declined over time.

CONCLUSIONS: Augmenting NLP-based identification with ICD codes yielded more individuals with distinct demographic and clinical profiles, demonstrating the method’s ability to improve case detection and characterise heterogeneity.

CLINICAL IMPLICATIONS: Given that PPD is underdiagnosed and undertreated, this novel approach demonstrates further potential for NLP in healthcare settings to capture more cases, enabling earlier and more personalised interventions that reach patients who may otherwise be overlooked.

PMID:41871883 | DOI:10.1136/bmjment-2025-302066

By Nevin Manimala

Portfolio Website for Nevin Manimala