Sci Rep. 2026 Mar 30. doi: 10.1038/s41598-026-44124-0. Online ahead of print.
ABSTRACT
Comparative analyses of nucleotide sequences across diverse taxa, including viruses, bacteria, plants, and mammals, consistently reveal patch-type sequence identities of around 45%. These identities consist of short stretches interspersed by mismatches. Similarly, identity patterns emerge in alignments of randomized shuffled or scrambled sequences. These findings suggest patch-type identities reflect intrinsic statistical properties of the four-letter genetic alphabet. Such patterns likely function as recognition signals for illegitimate recombination, a mechanism that promotes sequence insertions, exchanges, and rearrangements without extensive homology. Patch-type identities have been observed at integration sites of foreign DNA and may play a role in evolutionary innovation and rapid diversification (e. g. SARS-CoV-2). Simulation data support the ideas that the frequency and length distribution of matching segments can be predicted by statistical models based on base composition, yet may also create local environments conducive to recombination. Further, the statistical architecture of the genetic alphabet encodes not only biological information, but also the potential for genome remodeling and adaptation during evolution. By bridging fundamental sequence properties with biological outcomes, this study provides a framework for exploring how randomness at the nucleotide sequence level can give rise to order and complexity across the tree of life.
PMID:41905996 | DOI:10.1038/s41598-026-44124-0