G3 (Bethesda). 2025 Jan 30:jkaf018. doi: 10.1093/g3journal/jkaf018. Online ahead of print.
ABSTRACT
Genetic studies of Plasmodium parasites increasingly feature relatedness estimates. However, various aspects of malaria parasite relatedness estimation are not fully understood. For example, relatedness estimates based on whole-genome-sequence (WGS) data often exceed those based on sparser data types. Systematic bias in relatedness estimation is well documented in the literature geared towards diploid organisms, but largely unknown within the malaria community. We characterise systematic bias in malaria parasite relatedness estimation using three complementary approaches: theoretically, under a non-ancestral statistical model of pairwise relatedness; numerically, under a simulation model of ancestry; and empirically, using data on parasites sampled from Guyana and Colombia. We show that allele frequency estimates encode, locus-by-locus, relatedness averaged over the set of sampled parasites used to compute them. Plugging sample allele frequencies into models of pairwise relatedness can lead to systematic underestimation. However, systematic underestimation can be viewed as population-relatedness calibration, i.e., a way of generating measures of relative relatedness. Systematic underestimation is unavoidable when relatedness is estimated assuming independence between genetic markers. It is mitigated when relatedness is estimated using WGS data under a hidden Markov model (HMM) that exploits linkage between proximal markers. The extent of mitigation is unknowable when a HMM is fit to sparser data, but downstream analyses that use high relatedness thresholds are relatively robust regardless. In summary, practitioners can either resolve to use relative relatedness estimated under independence, or try to estimate absolute relatedness under a HMM. We propose various tools to help practitioners evaluate their situation on a case-by-case basis.
PMID:39883524 | DOI:10.1093/g3journal/jkaf018