Comparative review of artificial intelligence for transcriptomic biomarker discovery in coronavirus disease 2019 (COVID-19)

Brief Bioinform. 2026 May 4;27(3):bbag249. doi: 10.1093/bib/bbag249.

ABSTRACT

The Coronavirus Disease 2019 (COVID-19) pandemic has highlighted the significance of reliable molecular biomarkers in clinical use. Despite the popularity of traditional statistical approaches, the high dimensionality of transcriptomic data presents challenges for these conventional methods. While artificial intelligence (AI) algorithms have emerged as highly advantageous for handling these complex datasets, there is a lack of evaluation of these approaches in COVID-19 transcriptomic studies. This review aims to provide an evaluation of these studies employed for transcriptomic biomarker discovery in COVID-19 using AI, assessing their study designs, methodologies, and outcomes. Based on a comprehensive search for literature across five databases including Web of Science Core Collection, Scopus, PubMed/MEDLINE, IEEE Xplore Digital Library, and LitCovid from December 2019 to March 2025, this review selected 63 studies for a narrative synthesis of four key sections: (i) The Landscape of AI-Driven COVID-19 Transcriptomics, (ii) Limitations of Studies, (iii) A Proposed AI-Driven Transcriptomics Framework, and (iv) Clinical Translation Challenges, Opportunities, and Future Directions. Our analysis revealed limitations in data quality, sample size, and heterogeneity, as well as methodologies regarding validation and interpretability. Thus, we proposed an evidence-informed workflow that addresses these current limitations in study design, while acknowledging real-world constraints. We further discuss the emerging potential of agentic AI systems as a promising solution to current limitations. By bridging methodological gaps with translation considerations, this review can enhance pandemic response strategies for future emerging infectious diseases. Key Points Applications observed in reviewed studies mainly included applications in diagnosis and severity stratification of COVID-19 patients. The limitations of current studies included small sample sizes, the reliance on public datasets lacking detailed metadata, batch effects and data heterogeneity reducing model robustness, the lack of external validation, risks of data leakage and circular validation leading to inflated performance metrics, and challenges in model interpretability. An evidence-informed AI-driven framework is proposed, acknowledging real-world constraints including small pandemic cohort sizes, domain shift from viral evolution, and resource-limited settings, with emerging agentic AI systems offering potential solutions.

PMID:42184107 | DOI:10.1093/bib/bbag249

By Nevin Manimala