Categories
Nevin Manimala Statistics

Comprehensive statistical and machine learning framework for identification of metabolomic biomarkers in breast cancer

Metabolomics. 2025 Jun 14;21(4):78. doi: 10.1007/s11306-025-02265-9.

ABSTRACT

INTRODUCTION: Breast cancer is the most common cancer among women, with its burden increasing over the past decades. Early diagnosis significantly improves survival rates and reduces lethality. Innovative technologies are being developed for early detection, making accurate tumor identification crucial.

OBJECTIVES: The research aims to identify significant metabolomics biomarkers that can help in detecting tumor progression, which could contribute to early breast cancer diagnosis.

METHODS: A dataset of 228 metabolites from breast cancer patients and healthy individuals was curated from the Metabolomics Workbench Database. Statistical tests and Machine Learning (ML) algorithms were applied for feature selection, assessing normality, variance homogeneity, and significance Recursive Feature Elimination (RFE) with a Random Forest (RF) classifier was used to identify a minimal set of six significant metabolites with strong predictive potential. A Ridge Classifier was employed for classification, achieving an 83% accuracy in distinguishing between cancerous and healthy individuals.

RESULTS: A minimal set of six significant metabolites was identified in plasma samples. The developed model showed an 83% accuracy in classifying cancerous vs. healthy individuals using the Ridge Classifier.

CONCLUSION: The study provides valuable insights into metabolomic changes associated with breast cancer, identifying potential biomarkers that could enhance early detection and diagnosis.

PMID:40515893 | DOI:10.1007/s11306-025-02265-9

By Nevin Manimala

Portfolio Website for Nevin Manimala