Categories
Nevin Manimala Statistics

Out-of-distribution generalization enhances protein function annotation for low-homology sequences

Brief Bioinform. 2026 May 4;27(3):bbag243. doi: 10.1093/bib/bbag243.

ABSTRACT

Understanding protein functions in biological processes is pivotal for disease elucidation and drug discovery. Despite notable progress, existing approaches primarily focus on function transfer under in-distribution (ID) settings, where training and test proteins exhibit high sequence similarity. As a result, their performance often degrades when applied to novel, diverse, and low-homology protein sequences, posing a major challenge for out-of-distribution (OOD) generalization encountered in practice. Towards this end, we develop ProteinScore, a graph transformer approach tailored to improve protein function prediction in OOD settings. ProteinScore integrates a label-invariant variational subgraph generator with self-supervised contrastive learning, thereby identifying meaning substructures within proteins. By highlighting informative features while filtering out redundant ones, ProteinScore improves generalization to diverse and low-homology sequences. Experiments on datasets with both experimentally resolved and AlphaFold2-predicted structures demonstrate that ProteinScore consistently outperforms strong baselines and provides biologically meaningful interpretability through accurately identifying binding sites. In addition, ProteinScore generalizes effectively to two additional downstream tasks, drug-target interaction classification and subcellular localization prediction, achieving superior predictive performance and reliable interpretability.

PMID:42184116 | DOI:10.1093/bib/bbag243

By Nevin Manimala

Portfolio Website for Nevin Manimala