J Am Soc Mass Spectrom. 2026 Feb 17. doi: 10.1021/jasms.5c00372. Online ahead of print.
ABSTRACT
Untargeted tandem mass spectrometry (MS/MS)-based metabolomics enable broad characterization of small molecules in complex samples, yet the majority of spectra in a typical experiment remain unannotated, limiting biological interpretation. Reference data-driven (RDD) metabolomics addresses this gap by contextualizing spectra through comparison to curated, metadata-annotated reference data sets, allowing inference of spectrum origins without requiring exact structural identification. Here, we present an open-source RDD metabolomics platform comprising a user-friendly web application and a Python software package that performs RDD analyses directly from molecular networking outputs generated by GNPS. The tools support visualization and statistical analysis of RDD results, including interactive bar plots, heat maps, principal component analysis, and Sankey diagrams. We illustrate the approach using a hierarchical reference data set of 3500 food items to derive dietary patterns from stool metabolomics data of omnivore and vegan participants. The analysis reveals clear dietary group separation, demonstrating how RDD metabolomics can extract biologically meaningful patterns from otherwise unannotated spectra. Thus, the RDD metabolomics platform removes technical barriers for the metabolomics community to adopting RDD analysis, with the functionality freely available at https://github.com/bittremieuxlab/gnps-rdd and https://gnps-rdd.bittremieuxlab.org/.
PMID:41701920 | DOI:10.1021/jasms.5c00372