Categories
Nevin Manimala Statistics

Energy entropy vector: a novel approach for efficient microbial genomic sequence analysis and classification

Brief Bioinform. 2025 Sep 6;26(5):bbaf459. doi: 10.1093/bib/bbaf459.

ABSTRACT

With the rapid development of genomic sequencing technologies, there is an increasing demand for efficient and accurate sequence analysis methods. However, existing methods face challenges in handling long, variable-length sequences and large-scale datasets. To address these issues, we propose a novel encoding method-Energy Entropy Vector (EEV). This method encodes gene sequences of arbitrary length into fixed-dimensional vector representations by modeling nucleotide energy characteristics based on information entropy. Experiments conducted on five microbial datasets demonstrate that, compared to traditional alignment-free methods, EEV achieves higher accuracy in convex hull classification and species classification tasks, with improvements of 15% to 30% in family-level classification. In phylogenetic tree construction, EEV significantly accelerates the process relative to multiple sequence alignment methods while maintaining high tree quality, enabling rapid and accurate phylogenetic reconstruction. Moreover, EEV supports flexible dimensional expansion by superimposing nucleotide energies, enhancing its ability to represent complex genomic sequences while effectively alleviating sparsity issues in high-dimensional representations. This study provides an efficient gene encoding strategy for large-scale genomic analysis and evolutionary research.

PMID:40914969 | DOI:10.1093/bib/bbaf459

By Nevin Manimala

Portfolio Website for Nevin Manimala