Knowledge-guided Bayesian biclustering model for omics data with noisy graphs

Biometrics. 2026 Apr 9;82(2):ujag070. doi: 10.1093/biomtc/ujag070.

ABSTRACT

Extracting biologically meaningful information from high-dimensional, heterogeneous omics data is one of the key challenges in many biomedical studies. Among various biomedical applications, disease subtyping is of particular interest due to its critical role in improving diagnosis and designing personalized treatments. To address this, biclustering has become a widely used statistical method for subtyping. Additionally, it has been demonstrated across various statistical learning methods that incorporating biological graph knowledge such as gene regulatory network can significantly enhance variable selection, prediction accuracy, and model interpretability. However, existing graph-guided methods, while often yielding promising results, tend to overlook potential misspecifications, such as false positive (FP) and false negative (FN) edges in the graphs. Ignoring this noise can lead to suboptimal identification of biclusters. Therefore, it is essential to develop biclustering methods that can effectively handle noisy graphs as well as provide biological insight. We propose a Bayesian denoising knowledge-guided biclustering method that enables to integrate multiple graphs simultaneously. The incorporated graphs, viewed as noisy variations of the underlying true graph, are de-noised through modeling of FP and FN errors. A Markov chain Monte Carlo algorithm is developed to estimate the biclusters. Extensive simulation studies and real data analyses, including gene expression and proteomics datasets of Alzheimer’s disease, have been conducted to validate the superior performance of the proposed method.

PMID:42145180 | DOI:10.1093/biomtc/ujag070

By Nevin Manimala