Read Length Dominates Phylogenetic Placement Accuracy of Ancient DNA Reads

Mol Biol Evol. 2025 Jan 17:msaf006. doi: 10.1093/molbev/msaf006. Online ahead of print.

ABSTRACT

A common problem when analyzing ancient DNA (aDNA) data is to identify the species which corresponds to the recovered aDNA sequence(s). The standard approach is to deploy sequence similarity based tools, such as BLAST. However, as aDNA reads may frequently stem from unsampled taxa due to extinction, it is likely that there is no exact match in any database. As a consequence, these tools may not be able to accurately place such reads in a phylogenetic context. Phylogenetic placement is a technique where a read is placed onto a specific branch of a phylogenetic reference tree, which allows for a substantially finer resolution when identifying reads. Prior applications of phylogenetic placement has deployed only on data from extant sources. Therefore, it is unclear how the aDNA damage affects phylogenetic placement’s applicability to aDNA data. To investigate how aDNA damage affects placement accuracy, we re-implemented a statistical model of aDNA damage. We deploy this model, along with a modified version of the existing assessment pipeline PEWO, to seven empirical datasets with four leading tools: APPLES, EPA-NG, pplacer, and RAPPAS. We explore the aDNA damage parameter space via a grid search in order to identify the aDNA damage factors that exhibit the largest impact on placement accuracy. We find that the frequency of DNA backbone nicks (and consequently read length) has the, by far, largest impact on aDNA read placement accuracy, and that other factors, such as misincorporations, have a negligible effect on overall placement accuracy.

PMID:39823473 | DOI:10.1093/molbev/msaf006

By Nevin Manimala