Health Qual Life Outcomes. 2025 Aug 26;23(1):80. doi: 10.1186/s12955-025-02412-9.
ABSTRACT
BACKGROUND: Arthritis is a prevalent chronic disease substantially impacting patients’ quality of life (QoL). While identifying key determinants associated with arthritis is critical for targeted interventions, traditional statistical methods often struggle with complex interactions, and existing machine learning (ML) approaches frequently lack the interpretability needed to guide clinical decisions. This study integrates a comprehensive, explainable machine learning (XAI) workflow to identify and interpret key QoL-related predictors of arthritis status in a large national cohort.
METHODS: Data were obtained from 15,011 participants aged > 45 years in the 2020 China Health and Retirement Longitudinal Study (CHARLS). We initially selected 55 potential QoL-related predictors spanning demographic, functional, pain, psychosocial, and lifestyle domains. Feature engineering was performed to create aggregate scores, indicators, and binned variables. Missing data were handled using imputation combined with missing indicator variables. A LightGBM-based feature selection process identified 68 key predictors. Nine ML models (including Logistic Regression, RandomForest, GradientBoosting, LightGBM, CatBoost, XGBoost, DecisionTree, NaiveBayes, KNN) were developed using SMOTE-resampled training data, with hyperparameters optimized via Optuna targeting recall. Performance was evaluated on a held-out test set using Area Under the ROC Curve (AUC), Average Precision (AP), Recall, Specificity, Precison, and F1-score. SHapley Additive exPlanations (SHAP) analysis was applied to the best-performing model (GradientBoosting) for interpretation.
RESULTS: Several models achieved strong predictive performance, with GradientBoosting yielding the highest AUC (0.767, 95% CI: 0.752-0.782) and high AP (0.678, 95% CI: 0.655-0.702). SHAP analysis identified multi-site pain burden (particularly knee/leg pain and pain location count), age, self-rated health, sleep quality, functional limitations (ADL counts/scores), and negative affect as the most influential predictors driving arthritis status prediction.
CONCLUSIONS: This study successfully applied an XAI pipeline to identify and rank key QoL-related factors predictive of arthritis status in a large Chinese cohort, achieving robust model performance. Pain burden, age, subjective health, sleep, functional status, and psychological well-being are critical determinants. These interpretable findings can inform risk stratification and guide targeted interventions focusing on these key areas to potentially improve arthritis management.
PMID:40859240 | DOI:10.1186/s12955-025-02412-9