Interpretable machine learning models for stroke risk prediction in patients with newly diagnosed atrial fibrillation

NPJ Digit Med. 2026 Apr 7;9(1):289. doi: 10.1038/s41746-026-02470-3.

ABSTRACT

Atrial fibrillation (AF) is the most common sustained arrhythmia and a leading cause of ischemic stroke. Existing risk scores, such as CHA₂DS₂-VASc, offer limited predictive accuracy and fail to capture complex clinical patterns. To improve generalizability and clinical utility, we developed and externally validated clinically interpretable machine learning models using only age, comorbidities, and medication use to predict 1-year stroke risk in patients with newly diagnosed AF. Both logistic regression (LR) and Platt-calibrated extreme gradient boosting (XGB) models achieved high discrimination in internal (AUCs = 0.915 and 0.914) and external validation cohorts (AUCs = 0.877-0.886), significantly outperforming CHA₂DS₂-VASc (AUCs = 0.614-0.621; p < 0.001). Calibration curves and decision curve analysis confirmed strong clinical utility. Long-term follow-up demonstrated superior risk stratification and treatment responsiveness in LR-defined high-risk groups. These models provide accurate, individualized stroke risk estimates to guide direct oral anticoagulant (DOAC) initiation in real-world hospital settings.

PMID:41946928 | DOI:10.1038/s41746-026-02470-3

By Nevin Manimala