Categories
Nevin Manimala Statistics

DeepIMB: Imputation of non-biological zero counts in microbiome data

Genes Genomics. 2025 Nov 6. doi: 10.1007/s13258-025-01693-0. Online ahead of print.

ABSTRACT

BACKGROUND: The high prevalence of non-biological zero counts, arising from low sequencing depth and sampling variation, presents a significant challenge in microbiome data analysis. These zeros can distort taxon abundance distributions and hinder the identification of true biological signals, complicating downstream analyses.

OBJECTIVE: To address the challenges of non-biological zeros in microbiome datasets, we propose DeepIMB, a deep learning-based imputation method for microbiome data, specifically designed to accurately identify and impute non-biological zero counts while preserving biological integrity.

METHODS: DeepIMB operates in two main phases. First, it identifies non-biological zeros using a gamma-normal mixture model applied to the normalized, log-transformed taxon count matrix. Second, it imputes these zeros with a deep neural network model that integrates diverse sources of information, including taxon abundances, sample covariates, and phylogenetic distances, thereby learning complex, nonlinear relationships within microbiome data.

RESULTS: By leveraging integrated information from multiple data types, DeepIMB accurately imputes non-biological zeros while preserving true biological signals. In our two simulation studies, DeepIMB outperformed existing imputation methods in terms of mean squared error, Pearson correlation coefficient, and Wasserstein distance.

CONCLUSION: DeepIMB effectively addresses the challenges posed by non-biological zeros in microbiome data. By improving the quality of the data and the reliability of downstream analyses, DeepIMB represents a significant advancement in microbiome research methodologies.

PMID:41196474 | DOI:10.1007/s13258-025-01693-0

By Nevin Manimala

Portfolio Website for Nevin Manimala