Respiratory sound-based AI screening of asthma and COPD via multi-feature fusion and CatBoost classification

Sci Rep. 2026 Jun 1. doi: 10.1038/s41598-026-54803-7. Online ahead of print.

ABSTRACT

Asthma and chronic obstructive pulmonary disease (COPD) are significant global health burdens with conventional diagnosis relying on resource-intensive spirometry. This paper presents a reproducible multimodal respiratory sound screening model combining complementary acoustic and clinical representations. The proposed method fuses handcrafted spectral-temporal features (MFCCs, chroma, spectral contrast, tonnetz, mel-spectrogram, tempogram) with precomputed cough and vowel embeddings and structured clinical metadata, processed via a class-weighted CatBoost ensemble on the standardized AIRS Kaggle benchmark dataset. The model achieves an overall accuracy of 90.3% with class-wise F1-scores of 0.945 (Healthy), 0.915 (Asthma), and 0.842 (COPD). Systematic ablation experiments confirm the importance of multimodal fusion (-7.8% accuracy without full feature fusion), the attention mechanism (-4.9%), and data augmentation (-6.7%). Additional metrics such as, Matthews Correlation Coefficient (MCC = 0.856) and Cohen’s Kappa (κ = 0.849) – confirm robust classification under class imbalance. Structured multimodal feature fusion with gradient boosting enables scalable, reproducible respiratory disease screening applicable to telemedicine. Future work should address prospective validation on diverse, multi-institutional clinical cohorts.

PMID:42225794 | DOI:10.1038/s41598-026-54803-7

By Nevin Manimala