BioData Min. 2026 Jun 25. doi: 10.1186/s13040-026-00579-5. Online ahead of print.
ABSTRACT
BACKGROUND: Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease with median survival of 3-5 years. Patient responses to treatments vary widely, highlighting the need for personalized care. Clustering patients based on disease progression could improve prognosis, guide clinical decision-making, and optimize clinical trial design. This study aimed to identify robust ALS patient clusters using ALS Functional Rating Scale-Revised (ALSFRS-R) scores and to determine diagnostic parameters predictive of cluster membership, enabling earlier stratification and targeted management.
METHODS: Data from the Tours ALS center registry (April 1997-October 2023) were analyzed; after preprocessing, 353 patients monitored every three months between January 2004 and July 2023 with ALSFRS-R, clinical, biological, and demographic data were retained. After preprocessing to handle missing or aberrant data, a weakly supervised approach labeled patient pairs based on their ALSFRS-R sequences. These labels were used to train a classifier to learn a distance for off-the-shelf clustering algorithms. Multiple configurations were tested, varying clustering algorithms, dimensionality reduction method, and number of clusters. Random Forest (RF) model predicted cluster membership from diagnostic parameters. Optimal clustering was selected using silhouette score, validated with Kaplan-Meier survival analysis. Stability and robustness were assessed with the Adjusted Rand Index (ARI) and silhouette score respectively. Predictive performance was evaluated using specificity, sensitivity, positive predictive value (PPV), and negative predictive value (NPV). Diagnostic parameters associated with clusters were identified using Kruskal-Wallis and chi-squared tests for continuous and categorical variables.
RESULTS: Three clusters (n = 139, 121, 93) were identified, demonstrating strong separation (silhouette ≈ 0.6) and high stability of results (ARI ≈ 0.7). Survival differed significantly among clusters: over 50% of patients in the third cluster survived beyond 50 months, compared to less than 25% in the other clusters. Thirteen diagnostic parameters-including ALSFRS-R subscores, IgG levels, albumin quotient, and time to diagnosis-were key predictors of cluster membership. Cluster prediction achieved specificity and NPV ≈ 0.75, with close sensitivity and PPV compared to state-of-the-art methods.
CONCLUSION: This framework successfully stratifies ALS patients into clinically meaningful clusters, revealing underlying disease heterogeneity and providing strong prognostic insight. Such classification can facilitate personalized care, guide therapeutic decisions, and inform the design of targeted interventions to improve outcomes.
CLINICAL TRIAL NUMBER: Not applicable.
PMID:42351201 | DOI:10.1186/s13040-026-00579-5