Beyond Comparing Machine Learning and Logistic Regression in Clinical Prediction Modelling: Shifting from Model Debate to Data Quality

J Med Internet Res. 2025 Nov 5;27:e77721. doi: 10.2196/77721.

ABSTRACT

The rapid uptake of supervised machine learning (ML) in clinical prediction modelling, particularly for binary outcomes based on tabular data, has sparked debate about its comparative advantage over traditional statistical logistic regression. Although ML has demonstrated superiority in unstructured data domains, its performance gains in structured, tabular clinical datasets remain inconsistent and context dependent. This viewpoint synthesizes recent comparative studies and simulation findings to argue that there is no universal best modelling approach. Model performance depends heavily on dataset characteristics (eg, linearity, sample size, number of candidate predictors, minority class proportion) and data quality (eg, completeness, accuracy). Consequently, we argue that efforts to improve data quality, not model complexity, are more likely to enhance the reliability and real-world utility of clinical prediction models.

PMID:41191908 | DOI:10.2196/77721

By Nevin Manimala