Categories
Nevin Manimala Statistics

Performance of machine learning algorithms in diffusion tensor imaging of movement disorders: an exploratory meta-analysis

Biomed Eng Online. 2026 Feb 7. doi: 10.1186/s12938-026-01528-3. Online ahead of print.

ABSTRACT

BACKGROUND: Machine learning (ML) applied to diffusion tensor imaging (DTI) has emerged as a promising tool for detecting microstructural brain alterations in movement disorders. However, existing studies vary widely in design, sample size, imaging pipelines, and analytic rigor, resulting in high methodological heterogeneity that limits quantitative comparability.

OBJECTIVES: This exploratory meta-analysis and narrative synthesis aimed to characterize performance trends, methodological diversity, and sources of variability among ML models trained on DTI data for classifying movement disorders, rather than to infer a single pooled diagnostic effect. This was designated exploratory because extreme heterogeneity prevented confirmatory pooled effect inference, so the analysis focused on describing performance distributions and methodological patterns rather than estimating a unified diagnostic effect.

METHODS: A systematic search of PubMed, Web of Science, and Scopus identified human studies applying ML algorithms to DTI for diagnostic or classification purposes. Accuracy, sensitivity, specificity, and the area under the curve (AUC) were extracted, with multiple imputation used for incomplete metrics with missingness rates below 40%. Random-effects modeling was employed to provide descriptive summaries, and subgroup analyses were conducted to explore trends across disorders, model architectures, and imaging modalities. Study qualities were assessed with JBI tools.

RESULTS: Forty-six studies (2016-2024) were included, spanning Parkinson’s disease, Tourette syndrome, and essential tremor. Reported performance was generally high (median AUC ≈ 0.91), but between-study heterogeneity was extreme (I2 = 94.7%), indicating that studies were estimating distinct effects. Disorder-specific subgroup AUCs varied markedly: Essential Tremor (0.95), Parkinson’s (0.90), Tourette’s (0.88), and Other (0.79). Deep learning and radiomics-based models have reported higher accuracies, but they were often trained on small, single-center cohorts (37-139 participants), which limits their external validity. Pooled statistics were presented descriptively to illustrate performance ranges despite high heterogeneity, and were not interpreted as confirmatory effect sizes.

CONCLUSIONS: ML models using DTI demonstrate high internal performance across studies, although generalizability remains limited across multiple movement disorders; however, current evidence remains exploratory due to small sample sizes, methodological fragmentation, and a lack of standardized imaging pipelines. Rather than confirmatory inference, these findings provide a descriptive map of emerging trends in ML-DTI diagnostics. Future progress will depend on data harmonization initiatives, multicenter collaborations, and federated learning frameworks that can support reproducible, generalizable, and clinically interpretable models.

PMID:41654900 | DOI:10.1186/s12938-026-01528-3

By Nevin Manimala

Portfolio Website for Nevin Manimala