Categories
Nevin Manimala Statistics

EECFS: Efficient Ensemble Causal Feature Selection for High-Dimensional Molecular Data

J Chem Inf Model. 2026 Jun 16. doi: 10.1021/acs.jcim.6c00965. Online ahead of print.

ABSTRACT

High-dimensional feature spaces combined with limited sample sizes present substantial challenges for biological prediction tasks. Traditional feature selection methods rely on statistical associations between features and labels, whereas causal feature selection identifies features causally related to the target, improving the interpretability and robustness. Among them, constraint-based methods identify the Markov blanket of the target variable through the conditional independence tests. However, existing constraint-based ensemble strategies are computationally demanding, particularly during the spouse-discovery stage. To address this limitation, we propose EECFS, a novel ensemble causal feature selection algorithm that reduces the computational cost through an efficient spouse discovery strategy. Extensive evaluations on 16 Bayesian network datasets and 17 real-world datasets demonstrate that EECFS achieves improved efficiency while maintaining competitive or superior predictive performance compared with 11 representative methods. Furthermore, we extend causal feature selection to the task of synonymous variant effect prediction and developed CFDPSM. From an initial pool of 23 866 features spanning DNA, RNA, and protein molecular levels, CFDPSM identifies a compact set of 30 Markov blanket features. The experimental results show that it outperforms 13 existing variant effect prediction methods while providing enhanced interpretability.

PMID:42302235 | DOI:10.1021/acs.jcim.6c00965

By Nevin Manimala

Portfolio Website for Nevin Manimala