Alzheimers Res Ther. 2026 Jan 8. doi: 10.1186/s13195-025-01950-0. Online ahead of print.
ABSTRACT
BACKGROUND AND OBJECTIVE: The clinical interpretation of Alzheimer’s disease (AD) is frequently complicated by the prevalence of missense variants designated as being of uncertain significance within associated genes. Conventional computational prediction tools often overlook disease-specific pathophysiological contexts and lack pertinence and interpretability. Therefore, the present study aimed to develop a novel, interpretable framework for predicting the pathogenicity of AD missense variants by integrating transcriptomic and proteomic data enrichment patterns with machine learning methods.
METHODS: A cross-sectional variant-level analysis was performed using publicly available databases. Missense variants in APOE, APP, PSEN1, PSEN2, SORL1, and TREM2 reported in AD patients were retrieved from Alzforum and compared with missense variants from individuals without neurological diseases, as cataloged in the gnomAD v2.1.1 non-neuro subset. Variants were annotated with tissue-specific expression, secondary structure, relative solvent accessibility, and other functional features using tools like AlphaFold. Enrichment of specific features was assessed with Fisher’s exact tests with Bonferroni correction for multiple comparisons. Given that PSEN1 showed the strongest enrichment signals, six machine-learning algorithms were trained on PSEN1 variants to distinguish AD-associated variants from gnomAD variants, using a 10 × 5 nested cross-validation scheme. External validation was conducted using PSEN1 missense variants from ClinVar annotated as pathogenic/likely pathogenic or benign/likely benign. Model performance was compared with SIFT and PolyPhen-2, and interpretability was evaluated by feature ablation and SHapley Additive exPlanations analyses.
RESULTS: AD-associated variants exhibited statistically significant enrichment within some transcriptomic or proteomic features, with PSEN1 contributing significantly to the enrichment observed across these features. Random forest and gradient boosting models achieved high performance in the internal training dataset and maintained high recall in the external validation dataset, outperforming SIFT and approaching the performance of PolyPhen-2. Relative solvent accessibility was the most discriminative individual feature, while regional and topological features provided complementary discriminative power.
CONCLUSIONS: This integrative, multi-omics framework links disease-specific enrichment patterns with interpretable gene-level machine learning for AD missense variants. The results highlight the importance of expression level, structural context, etc. for PSEN1 variant pathogenicity and may help prioritize variants for functional studies. Further validation in additional genes and independent cohorts is warranted prior to any clinical application.
PMID:41508098 | DOI:10.1186/s13195-025-01950-0