Eur Radiol. 2026 Jan 28. doi: 10.1007/s00330-026-12324-x. Online ahead of print.
ABSTRACT
OBJECTIVE: To evaluate the diagnostic performance of semi-supervised learning models for aggressive prostate cancer detection on MRI compared to fully supervised models trained with additional expert annotations.
MATERIALS AND METHODS: We used 1500 MRI scans from the PI-CAI challenge training subset. Positive scans had 220 human and 205 AI-generated annotations. The mtU-Net (proposed teacher-student semi-supervised approach) was compared to supervised (trained using only 220 human annotations) and semi-supervised (trained on human and AI-generated annotations) nnU-Net. The 205 AI-annotated scans were manually annotated, and a fully supervised model was trained. External validation was performed on a newly annotated dataset from the PROMIS study (n = 574, 403 lesions) and the Prostate158 dataset (n = 158, 126 lesions). Patient-level performance was evaluated using the area under the curve (AUC) and lesion-level detection (overlap > 0.10) using average precision (AP), along with 95% confidence Intervals (in brackets), and the DeLong test to compare AUCs against the supervised and fully supervised models.
RESULTS: The fully supervised nnU-Net showed the highest performance on the internal PI-CAI test set (AUC = 0.89 [0.87-0.91], AP = 0.65 [0.60-0.70]) and external validation datasets PROMIS (AUC = 0.68 [0.64-0.72], AP = 0.24 [0.20-0.29]) and Prostate158 (AUC = 0.87 [0.82-0.92], AP = 0.64 [0.56-0.72]), significantly outperforming the supervised baseline (p < 0.0 5). The proposed semi-supervised mtU-Net demonstrated close external validation performance on PROMIS (AUC = 0.66 [0.62-0.71], AP = 0.20 [0.16-0.25]) and Prostate158 (AUC = 0.86 [0.81-0.92], AP = 0.58 [0.49-0.67]), significantly outperforming the supervised baseline on both datasets (p = 0.047 and p = 0.014, respectively), and showing no significant difference to the fully supervised model (p = 0.199 and p = 0.702, respectively).
CONCLUSION: In prostate MRI tumor detection, fully supervised learning performed best. However, in external validation, the semi-supervised methods demonstrated performance that approached that of the fully supervised model, proving a valuable approach when expert annotations are limited.
KEY POINTS: Question The need for extensive expert voxel-level annotations delays the development of AI-based prostate cancer diagnostic tools and their implementation in clinical practice. Findings The combination of pseudo-labeling with consistency regularization achieved performance comparable to that of fully supervised methods, demonstrating that data diversity matches the impact of expert annotation volume. Clinical relevance Semi-supervised learning reduces dependence on expert annotations while maintaining detection accuracy, enabling the development of scalable, automated diagnostic tools for prostate cancer amid growing clinical workflow demands.
PMID:41606246 | DOI:10.1007/s00330-026-12324-x