Genetics. 2025 May 20:iyaf094. doi: 10.1093/genetics/iyaf094. Online ahead of print.
ABSTRACT
Accurate reconstruction of pedigrees from genetic data remains a challenging problem. Many relationship categories (e.g. half-sibships versus avuncular) can be difficult to distinguish without external information. Pedigree inference algorithms are often trained on European-descent families in urban locations. Thus, existing methods tend to perform poorly in endogamous populations for which there may be reticulations within the pedigrees and elevated haplotype sharing. We present a simple, rapid algorithm which initially uses only high-confidence first-degree relationships to seed a machine learning step based on summary statistics of identity-by-descent (IBD) sharing. One of these statistics, our “haplotype score”, is novel and can be used to: (1) distinguish half-sibling pairs from avuncular or grandparent-grandchildren pairs; and (2) assign individuals to ancestor versus descendant generation. We test our approach in a sample of ∼700 individuals from northern Namibia, sampled from an endogamous population called the Himba. Due to a culture of concurrent relationships in the Himba, there is a high proportion of half-sibships. We accurately identify first through fourth-degree relationships and distinguish between various second-degree relationships: half-sibships, avuncular pairs, and grandparent-grandchildren. We further validate our approach in a second African-descent dataset, the Barbados Asthma Genetics Study (BAGS), and a European-descent founder population from Quebec. Accurate reconstruction of relatives facilitates estimation of allele frequencies, tracing allele trajectories, improved phasing, heritability and other population genomic questions.
PMID:40393068 | DOI:10.1093/genetics/iyaf094