Categories
Nevin Manimala Statistics

MetaCerberus: distributed highly parallelized HMM-based processing for robust functional annotation across the tree of life

Bioinformatics. 2024 Feb 29:btae119. doi: 10.1093/bioinformatics/btae119. Online ahead of print.

ABSTRACT

MOTIVATION: MetaCerberus is a massively parallel, fast, low memory, scalable annotation tool for inference gene function across genomes to metacommunities. MetaCerberus provides an elusive HMM/HMMER-based tool at a rapid scale with low memory. It offers scalable gene elucidation to major public databases, including KEGG (KO), COGs, CAZy, FOAM, and specific databases for viruses, including VOGs and PHROGs, from single genomes to metacommunities.

RESULTS: MetaCerberus is 1.3x as fast on a single node than eggNOG-mapper v2 on 5x less memory using an exclusively HMM/HMMER mode. In a direct comparison, MetaCerberus provides better annotation of viruses, phages, and archaeal viruses than DRAM, Prokka, or InterProScan. MetaCerberus annotates more KOs across domains when compared to DRAM, with a 186x smaller database, and with 63x less memory. MetaCerberus is fully integrated for automatic analysis of statistics and pathways using differential statistic tools (i.e., DESeq2 and edgeR), pathway enrichment (GAGE R), and pathview R. MetaCerberus provides a novel tool for unlocking the biosphere across the tree of life at scale.

AVAILABILITY: MetaCerberus is written in Python and distributed under a BSD-3 license. The source code of MetaCerberus is freely available at https://github.com/raw-lab/metacerberus compatible with Python 3 and works on both Mac OS X and Linux. MetaCerberus can also be easily installed using bioconda: mamba create -n metacerberus -c bioconda -c conda-forge metacerberus.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

PMID:38426351 | DOI:10.1093/bioinformatics/btae119

By Nevin Manimala

Portfolio Website for Nevin Manimala