Categories
Nevin Manimala Statistics

Machine learning and SHAP values explain the association between social determinants of health and post-stroke depression

BMC Public Health. 2025 Aug 21;25(1):2868. doi: 10.1186/s12889-025-24220-y.

ABSTRACT

OBJECTIVE: To create and verify a machine learning model that integrates social determinants of health (SDoH) for assessing post-stroke depression (PSD) and examining the association between SDoH and disease outcomes.

METHODS: Data were acquired from the National Health and Nutrition Examination Survey. Logistic regression was employed to analyse the association between SDoH and PSD, whereas Cox regression was utilized to assess the correlation between SDoH and all-cause mortality in PSD. The Boruta algorithm was employed for feature selection, and four machine learning models were constructed (CatBoost, Logistic, Multilayer Perceptron, and Random Forest) to evaluate the predictive effectiveness, calibration, and clinical applicability of these ML models. SHAP values were computed to ascertain the predictive significance of each feature in the model that exhibited the highest predictive performance.

RESULTS: Logistic regression analysis revealed a significant positive correlation between SDoH and PSD prevalence(p for trend < 0.0001). Compared to the other three models, CatBoost (AUC = 0.966) demonstrated the best overall predictive performance. Moreover, the decision curve analysis (DCA) and calibration curve findings demonstrated that the CatBoost model possessed considerable clinical utility and consistent predictive efficacy. The ten-fold cross-validation method further confirmed the model’s robustness and generalization ability.

CONCLUSIONS: A linear relationship exists between SDoH and PSD, with CatBoost demonstrating the best performance in predicting PSD. SHAP values emphasize the importance of SDoH.

PMID:40841950 | DOI:10.1186/s12889-025-24220-y

By Nevin Manimala

Portfolio Website for Nevin Manimala