Front Med (Lausanne). 2026 May 14;13:1804544. doi: 10.3389/fmed.2026.1804544. eCollection 2026.
ABSTRACT
BACKGROUND: Chronic obstructive pulmonary disease (COPD) is a chronic respiratory disease characterized by persistent respiratory symptoms and progressive airflow limitation. Acute exacerbations of COPD (AECOPD) are significant causes of hospitalization and death among COPD patients. This study aims to identify risk factors for AECOPD exacerbations and develop a highly accurate and interpretable predictive model using various statistical and machine learning methods.
METHODS: We retrospectively analyzed data from 2,102 COPD patients admitted between 1 January 2019 and 31 December 2024. The primary outcome was AECOPD severity, defined as the need for treatment escalation. Initial feature selection was performed using LASSO regression to identify potential risk factors. To validate the model’s effectiveness and explore its superior predictive performance, the dataset was partitioned by time period and proportion: The first 70% of observations in chronological order were used as the training set, with the remaining 30% as the test set. Multiple machine learning algorithms were then employed for model construction and comparison. To enhance model interpretability, we utilized SHapley Additive exPlanations (SHAP) to illustrate the contribution of each variable to the prediction outcomes.
RESULTS: Among the six machine learning models, the extreme gradient boosting (XGBoost) model demonstrated the optimal predictive performance, achieving an area under the receiver operating characteristic curve (AUC) of 0.960 (95% confidence interval (CI): 0.940-0.980) in the training set and 0.824 (95% CI: 0.804-0.844) in the test set. In the test set, the evaluation metrics were as follows: accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were 0.805, 0.65, 0.872, 0.669, and 0.859, respectively. SHAP analysis revealed that creatinine (CREA), neutrophil percentage (NEU%), D-dimer, brain natriuretic peptide (BNP), white blood cell count (WBC), and hypertension (HTN) were important factors influencing the model output.
CONCLUSION: The XGBoost model developed in this study demonstrates robust performance in predicting AECOPD risk using routinely collected clinical and laboratory data. The integration of SHAP analysis enhances model transparency, supporting its potential utility in clinical risk stratification and early intervention.
PMID:42221119 | PMC:PMC13216505 | DOI:10.3389/fmed.2026.1804544