Categories
Nevin Manimala Statistics

The Power of Many: An Ensemble Approach to Spectral Similarity

J Am Soc Mass Spectrom. 2025 Sep 5. doi: 10.1021/jasms.5c00176. Online ahead of print.

ABSTRACT

Quantifying the similarity between two mass spectra─a known reference mass spectrum and an unidentified sample mass spectrum─is at the heart of compound identification workflows in gas chromatography-mass spectrometry (GC-MS). The reference spectrum most like the sample is assigned as its identification (provided some quantitative similarity threshold is met, e.g., 80%) and thus accurately measuring similarity is essential. Significant research has gone toward developing metrics for this purpose, each of which has attempted to improve upon existing methods by incorporating GC-MS-specific information (e.g., peak ratios or retention times) or adopting various statistical and algorithmic frameworks. While this active development has led to a plethora of similarity metrics with demonstrated value across different contexts, the unfortunate consequence has been confusion surrounding which metric should be used as a global standard. No such metric is currently accepted as the standard method because different metrics have demonstrated optimal performance in different contexts. In this work, we propose an ensemble approach to spectral similarity scoring that combines the collective information from across existing similarity metrics to form an improved, globally representative similarity metric as a step toward establishing a global standard method. The resulting ensemble metrics are evaluated on over 88,000 spectra of varying complexity and demonstrate improved abilities to accurately rank the correct reference spectrum as the top-matching candidate for a sample relative to the rankings generated by individual similarity scores.

PMID:40911348 | DOI:10.1021/jasms.5c00176

By Nevin Manimala

Portfolio Website for Nevin Manimala