bioRxiv [Preprint]. 2025 Jun 2:2024.09.30.615220. doi: 10.1101/2024.09.30.615220.
ABSTRACT
We present a computational tool, MARLOWE, for source organism characterization of unknown, forensic biological samples. The intent of MARLOWE is to address a gap in applying proteomics data analysis to forensic applications. MARLOWE produces a list of potential source organisms given confident peptide tags derived from de novo peptide sequencing and a statistical approach to assign peptides to organisms in a probabilistic manner, based on a broad sequence database. In this way, the algorithm assumes no a priori knowledge of potential sources, and the probabilistic way peptides are taxonomically assigned and then scored enables results to be unbiased (within the constraints of the sequence database). In a proof-of-concept study, we examined MARLOWE’s performance on two datasets, the Biodiversity dataset and the Bacillus cereus superspecies dataset. Not only did MARLOWE demonstrate successful characterization to true contributors in single source and binary mixtures in the Biodiversity dataset, but also provided sufficient specificity to distinguish species within a bacterial superspecies group. We also compared MARLOWE’s results to those of MiCId, a leading microbial identification/characterization tool based on proteomics database search. Comparison of the two tools using 225 mass spectrometry data files yielded comparable performance, with slightly higher accuracy and specificity for MiCId. At the species level, MARLOWE achieved a specificity of 91.4% at 5% FDR. These results suggest that MARLOWE is suitable for candidate- or lead-generation identification of single-organism and binary samples that can generate forensic leads and aid in selecting appropriate follow-on analyses in a forensic context.
PMID:40501933 | PMC:PMC12157597 | DOI:10.1101/2024.09.30.615220