Reliability-Aware Deep Learning Framework for Chemical Genotoxicity Prediction with Uncertainty Quantification

J Chem Inf Model. 2026 May 20. doi: 10.1021/acs.jcim.6c00885. Online ahead of print.

ABSTRACT

Genotoxicity assessment is crucial for drug development and chemical safety evaluation. However, traditional experimental approaches are time-consuming, resource-intensive, and raise ethical concerns related to animal testing. Although computational models offer efficient alternatives, most existing approaches treat all data points equally, disregarding the heterogeneous quality of public database records and rarely addressing predictive uncertainty. We present a reliability-aware framework for genotoxicity prediction using a curated data set of 8,389 compounds annotated with experimental reliability tiers reflecting protocol quality, reproducibility, and review status. A two-step hierarchical learning strategy is employed. First, a message-passing neural network trained on high- and medium-reliability data is used to evaluate low-reliability samples and assign adaptive weights. These weights are then incorporated into conventional machine learning models, including random forest, support vector machine (SVM), and logistic regression, using molecular fingerprints. To address predictive uncertainty, we integrate conformal prediction, which provides distribution-free, finite-sample coverage guarantees for individual predictions. Random forest and RBF-kernel SVM achieved AUC values of 0.8613 and 0.8582, respectively, with Brier scores of 0.1523 and 0.1530. Conformal prediction attained 90.7% empirical coverage at α = 0.1 and identified 35.8% of test compounds as ambiguous. By incorporating data reliability and uncertainty quantification, the proposed framework provides a more transparent and uncertainty-aware approach.

PMID:42160670 | DOI:10.1021/acs.jcim.6c00885

By Nevin Manimala