Gigascience. 2026 Mar 23:giag030. doi: 10.1093/gigascience/giag030. Online ahead of print.
ABSTRACT
Strain-level metagenomic classification is essential for understanding microbial diversity and functional potential, yet remains challenging, particularly when sample composition is unknown and reference databases are large and redundant. Here we present MADRe, a modular and scalable pipeline for long-read strain-level metagenomic classification based on Metagenome Assembly-Driven Database Reduction. Beyond system-level integration, MADRe introduces statistical strategies that leverage assembly-derived genomic context to guide database reduction and probabilistic read reassignment. Specifically, it combines long-read metagenome assembly, contig-to-reference reassignment using an expectation-maximization framework for reference reduction, and probabilistic read mapping reassignment on a reduced database to achieve sensitive and precise strain-level classification. We extensively evaluated MADRe on simulated datasets, mock communities, and a real anaerobic digester sludge metagenome. Across diverse similarity and coverage conditions, MADRe consistently improves precision by reducing false-positive strain detections. MADRe’s design allows users to apply either the database reduction or read classification step individually. Using only the read classification step shows results on par with other tested tools. MADRe is open source and publicly available at https://github.com/lbcb-sci/MADRe.
PMID:41871361 | DOI:10.1093/gigascience/giag030