Categories
Nevin Manimala Statistics

Error Reduction in Leukemia Machine Learning Classification With Conformal Prediction

JCO Clin Cancer Inform. 2025 May;9:e2400324. doi: 10.1200/CCI-24-00324. Epub 2025 May 28.

ABSTRACT

PURPOSE: Recent advances in machine learning have led to the development of classifiers that predict molecular subtypes of acute lymphoblastic leukemia (ALL) using RNA-sequencing (RNA-seq) data. Although these models have shown promising results, they often lack robust performance guarantees. The aim of this study was three-fold: to quantify the uncertainty of these classifiers, to provide prediction sets that control the false-negative rate (FNR), and to perform implicit error reduction by transforming incorrect predictions into uncertain predictions.

METHODS: Conformal prediction (CP) is a distribution-agnostic framework for generating statistically calibrated prediction sets whose size reflects model uncertainty. In this study, we applied an extension called conformal risk control to three RNA-seq ALL subtype classifiers. Leveraging RNA-seq data from 1,227 patient samples taken at diagnosis, we developed a multiclass conformal predictor ALLCoP, which generates statistically guaranteed FNR-controlled prediction sets.

RESULTS: ALLCoP was able to create prediction sets with specified FNR tolerances ranging from 7.5% to 30%. In a validation cohort, ALLCoP successfully reduced the FNR of the ALLIUM RNA-seq ALL subtype classifier from 8.95% to 3.5%. For patients whose subtype was not previously known, the use of ALLCoP was able to reduce the occurrence of empty predictions from 37% to 17%. Notably, up to 34% of the multiple-class prediction sets included the PAX5alt subtype, suggesting that increased prediction set size may reflect secondary aberrations and biological complexity, contributing to classifier uncertainty. Finally, ALLCoP was validated on two additional RNA-seq ALL subtype classifiers, ALLSorts and ALLCatchR.

CONCLUSION: Our results highlight the potential of CP in enhancing the use of oncologic RNA-seq subtyping classifiers and also in uncovering additional molecular aberrations of potential clinical importance.

PMID:40435436 | DOI:10.1200/CCI-24-00324

By Nevin Manimala

Portfolio Website for Nevin Manimala