Front Biosci (Landmark Ed). 2025 Nov 27;30(11):45912. doi: 10.31083/FBL45912.
ABSTRACT
BACKGROUND: The nucleotide “words” (k-mers) of the genome exhibit two essentially universal properties that follow probabilistically from the Conservation of Hartley-Shannon Information (CoHSI): (1) a Zipfian rank-ordered distribution of frequencies and (2) universal inverse symmetry. Here, we address the presence of these 2 properties in the transcriptome, a question of interest given the strong and specific structure/function constraints on RNAs, especially the protein-coding (CDS) sequences.
METHODS: CDS and ncRNA (non-coding RNA) databases were accessed at e!Ensembl. For determination of a power-law, statistical tests of both necessity (linearity) and sufficiency (confidence that a power-law distribution could not be rejected) were applied. Compliance with inverse symmetry was assessed by linearity and residual standard error.
RESULTS: The CDS and non-coding RNAs for 53 species were analyzed separately and the data presented as short movies. The results were consistent for all species analyzed, and taking the bonobo (Pan paniscus) as a representative species, the following results were obtained. For the Zipfian distribution of k-mer frequencies, statistically robust tests of both necessity (adjusted R-squared of 0.9932 and p ≤ 2.2 × 10-16) and sufficiency were obtained for the CDS; for non-coding RNAs the test of necessity was robust (adjusted R-squared = 0.9982 and p ≤ 2.2 × 10-16). Perturbations of inverse symmetry were observed in both CDS (slope = 0.91, adjusted R2 = 0.77) and non-coding RNAs (slope = 1.02, adjusted R2 = 0.84). The disruption of inverse symmetry in the CDS affected particularly the 3- and 6-mers and was shown to be associated with codon (especially stop codon) frequency in the open reading frame.
CONCLUSIONS: Whereas the CoHSI-predicted Zipfian distribution of k-mer frequencies was observed in both the protein-coding and non-coding RNAs of 53 species, in contrast the compliance with inverse symmetry was weaker. This weakening of compliance was seen to a greater extent in the CDS than in the non-coding portions of the transcriptome and may be associated with the necessity to maintain the integrity of the reading frame in the CDS. These results illustrate the general principle that local perturbations of an overall CoHSI-guided equilibrium state of a biological system can provide insight into the underlying causes of such perturbations.
PMID:41351398 | DOI:10.31083/FBL45912