Brief Bioinform. 2026 Jan 7;27(1):bbaf713. doi: 10.1093/bib/bbaf713.
ABSTRACT
RNA performs a variety of functions within cells and is implicated in various human diseases. Because druggable proteins occupy a small portion of the genome, considerable interest has been increasing in developing drugs targeting RNAs. Thus, precise prediction of small-molecule binding sites across different classes of RNAs is important. In this study, a lightweight deep learning program for predicting RNA-drug binding sites, called compound binding site prediction for RNA (CoBRA), is introduced. Our approach utilizes residue-level embeddings derived from a pre-trained RNA language model, without relying on any structural information. These embeddings encapsulate the contextual and statistical properties of each nucleotide and are used as input for a multi-layer perceptron classifier that performs binary classification of binding nucleotides. The model was trained using the TR60 and HARIBOSS datasets and tested on four independent benchmark sets. The performance of CoBRA demonstrates a relative improvement of 22.1% in the Matthew correlation coefficient and a 45.6% increase in sensitivity compared to existing state-of-the-art RNA-ligand binding site prediction methods that utilize structural information. These results demonstrate that sequence-based language model embeddings, which do not require explicit coordinate or distance information, can match or outperform structure-based methods. This makes it a flexible tool for predicting binding sites across diverse RNA targets.
PMID:41520231 | DOI:10.1093/bib/bbaf713