Guidelines for standardising the application of discriminant analysis of principal components to genotype data

Mol Ecol Resour. 2022 Aug 30. doi: 10.1111/1755-0998.13706. Online ahead of print.

ABSTRACT

Despite the popularity of discriminant analysis of principal components (DAPC) for studying population structure, there has been little discussion of best practise for this method. In this work, I provide guidelines for standardising the application of DAPC to genotype datasets. An often-overlooked fact is that DAPC generates a model describing genetic differences among a set of populations defined by a researcher. Appropriate parameterisation of this model is critical for obtaining biologically meaningful results. I show that the number of leading PC axes used as predictors of among population differences, p_axes , should not exceed the k – 1 biologically informative PC axes that are expected for k effective populations in a genotype dataset. This k – 1 criterion for p_axes specification is more appropriate compared to the widely used proportional variance criterion, which often results in a choice of p_axes ≫ k – 1. DAPC parameterised with no more than the leading k – 1 PC axes: (1) is more parsimonious; (2) captures maximal among-population variation on biologically relevant predictors; (3) is less sensitive to unintended interpretations of population structure; and (4) is more generally applicable to independent sample sets. Assessing model fit should be routine practise and aids interpretation of population structure. It is imperative that researchers articulate their study goals, that is, testing a priori expectations versus studying de novo inferred populations, because this has implications on how their DAPC results should be interpreted. The discussion and practical recommendations in this work provide the molecular ecology community a roadmap for using DAPC in population genetic investigations.

PMID:36039574 | DOI:10.1111/1755-0998.13706

By Nevin Manimala