NAR Genom Bioinform. 2025 Dec 31;7(4):lqaf205. doi: 10.1093/nargab/lqaf205. eCollection 2025 Dec.
ABSTRACT
Accurate identification of affected tissues of human diseases is important for the derivation of disease etiology and the development of new treatment strategies. In this study, we develop a logistic regression-based method named DEDUCE (disease tissue detection using logistic regression) that combines genomics big data and machine learning to address this important problem. The central hypothesis is that most disease-associated genes are expressed specifically in affected tissues. DEDUCE takes advantage of newly emerged data on disease-related genes as well as tissue-specific gene expression data. The unique feature of DEDUCE is that it takes into account the strength of gene-disease associations. When we applied DEDUCE to a total of 3261, 324 gene-disease associations collected from DisGeNET covering 30,170 diseases and 21,666 genes, we identified 216 significant tissue-disease pairs composed of 120 unique diseases and 37 unique tissues. Many of them shed light on potential explanations for disease pathogenesis. The results showed great consistency with previous findings and were proven effective by empirical plots and gene set enrichment analysis. Overall, DEDUCE has shown great potential in uncovering novel pathogenesis mechanisms of complex diseases. In-depth analysis and experimental validation were required to fully understand these discovered tissue-trait associations and their enriched genes.
PMID:41480592 | PMC:PMC12754781 | DOI:10.1093/nargab/lqaf205