Categories
Nevin Manimala Statistics

An integrated single-cell transcriptomics and explainable AI approach for cancer stemness biomarker discovery in non-small cell lung cancer

Comput Biol Chem. 2026 Apr 22;124(Pt 1):109086. doi: 10.1016/j.compbiolchem.2026.109086. Online ahead of print.

ABSTRACT

Non-small cell lung cancer (NSCLC) contains rare cancer stem cells (CSCs) that contribute to relapse and drug resistance. Bulk RNA-seq overlooks these cells due to its averaging of expression across all cell types, whereas standard single-cell RNA-seq (scRNA-seq) analyses often struggle to reliably identify these rare CSC states. To address this, we developed an scRNA-seq pipeline that integrates machine learning with explainable AI (XAI) to detect CSC-like epithelial cells in NSCLC (GSE198099, n = 2 patients; analyses supported by effect size-based validation despite limited statistical power). Patient-derived scRNA-seq profiles underwent quality control and batch correction using scVI, and annotated using CellTypist. A 45-gene stemness score was used to identify candidate CSC-like states. Four machine learning models (Logistic Regression, LightGBM, XGBoost, and CatBoost) were trained to refine the CSC-like state identification. SHAP-based feature attribution analyses converged on six key biomarkers: DLL1, ITGA6, ATXN2, NOTCH1, DCLK1, and PUM1. These biomarkers are involved in regulating transcription, adhesion, cytoskeletal dynamics, and post-transcriptional control. Pathway analysis and validation using TCGA validation data provided supportive evidence for the biological relevance of these biomarkers. This framework provides a methodologically reproducible approach to reveal rare CSC-like states with improved mechanistic clarity, providing candidate biomarkers for studying NSCLC tumor plasticity.

PMID:42066389 | DOI:10.1016/j.compbiolchem.2026.109086

By Nevin Manimala

Portfolio Website for Nevin Manimala