Phylogenetic Methods Meet Deep Learning

Genome Biol Evol. 2025 Sep 19:evaf177. doi: 10.1093/gbe/evaf177. Online ahead of print.

ABSTRACT

Deep learning (DL) has been widely used in various scientific fields, but its integration into phylogenetics has been slower, primarily due to the complex nature of phylogenetic data. The studies that apply DL to sequencing data often limit analyses to 4-taxon trees. Many of these studies serve as “proof of principle” and perform similarly to traditional phylogeny reconstruction methods. New ways of using training data, such as encoding with compact bijective ladderized vectors or transformers, enable the handling of much larger trees and genomic data sets. This short perspective focuses on the application of DL in phylogenetics, introducing prevalent DL architectures. We highlight potential problems in the field by discussing the risks of using simulation-based training data and emphasize the importance of reproducibility and robustness in computational estimates. Finally, we explore promising research areas, including the combination of phylogenetics and population genetics in DL, the analysis of neighbor dependencies, and the potential to significantly reduce computational cost compared to traditional methods. This perspective illustrates the potential of DL in complementing traditional phylogeny reconstruction methods and aiding the advancement of phylogenetic analysis, especially in performing computationally demanding tasks such as model selection or estimating branch support values.

PMID:40973626 | DOI:10.1093/gbe/evaf177

By Nevin Manimala