G3 (Bethesda). 2026 Apr 24:jkag100. doi: 10.1093/g3journal/jkag100. Online ahead of print.
ABSTRACT
Identifying population structure from genetic data is a key challenge, for which several statistical methods have been developed, including F-statistics, which measure the average correlation in allele frequency differences between two pairs of populations. F-statistics are typically applied to a subset of genetic variation within the common allele frequency band, available through microarrays and SNP enrichment techniques. Recent advances in sequencing technology increasingly allow generating whole-genome sequencing data, both ancient and modern, which not only enable querying nearly every base of the genome, but also contain numerous rare variants. Rare variants, with their more population-specific distribution, allow detection of recent population structure with much finer resolution than common variants – an opportunity that has so far been under-exploited. Here, we develop a new statistical method, RAS (Rare Allele Sharing), for summarizing rare allele frequency correlations, similar to F-statistics but with flexible ascertainment on allele frequencies. We test RAS on both published and simulated data and find that RAS, with appropriate ascertainment, has better resolution than genome-wide F-statistics in identifying population structure caused by recent demographic events. Leveraging this, we further develop the use of RAS to compute ancestry proportions accurately in cases of recently diverged and closely-related source populations. We implemented the new statistical methods as an R package and a command line tool. In summary, our method can provide new perspectives to identify and model population structure, allowing us to understand more subtle relationships among populations in the recent human past.
PMID:42035364 | DOI:10.1093/g3journal/jkag100