Categories
Nevin Manimala Statistics

Harmonizing heterogeneous single-cell gene expression data with individual-level covariate information

Bioinform Adv. 2025 Aug 9;5(1):vbaf189. doi: 10.1093/bioadv/vbaf189. eCollection 2025.

ABSTRACT

MOTIVATION: The growing availability of single-cell RNA sequencing (scRNA-seq) data highlights the necessity for robust integration methods to uncover both shared and unique cellular features across samples. These datasets often exhibit technical variations and biological differences, complicating integrative analyses. While numerous integration methods have been proposed, many fail to account for individual-level covariates or are limited to discrete variables.

RESULTS: To address these limitations, we propose scINSIGHT2, a generalized linear latent variable model that accommodates both continuous covariates, such as age, and discrete factors, such as disease conditions. Through both simulation studies and real-data applications, we demonstrate that scINSIGHT2 accurately harmonizes scRNA-seq datasets, whether from single or multiple sources. These results highlight scINSIGHT2’s utility in capturing meaningful biological insights from scRNA-seq data while accounting for individual-level variation.

AVAILABILITY AND IMPLEMENTATION: The scINSIGHT2 method has been implemented as a R package, which is available at https://github.com/yudimu/scINSIGHT2/.

PMID:40874236 | PMC:PMC12380451 | DOI:10.1093/bioadv/vbaf189

By Nevin Manimala

Portfolio Website for Nevin Manimala