J Imaging Inform Med. 2025 Dec 2. doi: 10.1007/s10278-025-01756-4. Online ahead of print.
ABSTRACT
Accurate segmentation of upper airway structures such as the velum and OTE (oropharynx, tongue base, epiglottis) in drug-induced sleep endoscopy (DISE) images is crucial for predicting the degree and location of obstruction to determine treatment options for obstructive sleep apnea (OSA). This study systematically compares centralized learning (CL) and federated learning (FL) approaches for the semantic segmentation of these regions using multi-institutional DISE video data. A convolutional neural network (CNN)-based segmentation model was trained and evaluated for both learning paradigms. The results consistently showed that the CL approach achieved statistically significantly higher segmentation performance across all metrics-precision, recall, accuracy, and Dice similarity coefficient (DSC)-for both the velum and OTE regions compared with FL. For the velum region, CL achieved a DSC of 85.91 ± 1.01%, compared with FL’s 81.78 ± 0.58%. Similarly, for the OTE region, CL achieved an average DSC of 87.04 ± 0.41%, whereas FL achieved 85.20 ± 0.25%. Further analysis revealed that while both models struggled with ambiguous boundaries and anatomical variability-particularly for the tongue base-the epiglottis and oropharynx lateral wall were segmented with higher accuracy. These findings underscore the need for advanced techniques in FL, such as improved optimization algorithms and methods to address data heterogeneity, to narrow the performance gap with CL. This study provides foundational insights for developing more robust and clinically applicable deep learning models for upper airway analysis, emphasizing the importance of future research into advanced FL strategies and real-world validation.
PMID:41331654 | DOI:10.1007/s10278-025-01756-4