Sci Rep. 2026 Mar 21. doi: 10.1038/s41598-026-44603-4. Online ahead of print.
ABSTRACT
With the increasing availability of multimodal remote sensing (RS) data, semantic segmentation that leverages complementary information from true orthophotos (TOP) and digital surface models (DSM) has become essential for urban analysis. Diffusion-based segmentation provides an effective iterative refinement mechanism for modeling complex multimodal distributions; however, conventional pixel-wise supervision emphasizes local accuracy while overlooking global distribution alignment, often leading to inconsistent predictions and blurred object boundaries. Although maximum mean discrepancy (MMD) measures global statistical differences between predicted and ground-truth distributions, its effectiveness in high-dimensional class-probability spaces is limited by directional cancellation effects that reduce sensitivity to complex distribution shifts. To address this issue, we propose a projection-kernel regularized diffusion-based multimodal RS segmentation framework that enforces global statistical alignment through distribution-level regularization rather than modifying the intrinsic diffusion process. The proposed regularization performs multi-directional projections of high-dimensional class-probability vectors onto one-dimensional subspaces and derives a closed-form kernel integration to avoid numerical sampling across projection directions, enabling efficient and stable global distribution matching. In addition, a Cross-Attention Dual-Encoder Fusion (CADEF) module is introduced to alleviate geometry-texture misalignment, and a Hierarchical EMA-Gated Recursive Denoising (HERD) mechanism is designed to stabilize multiscale feature refinement. Experiments on the ISPRS Vaihingen and Potsdam benchmarks demonstrate that the proposed regularization consistently improves segmentation accuracy over state-of-the-art CNN-, Transformer-, and diffusion-based baselines, yielding enhanced global consistency and sharper boundary delineation. Code is available at: https://github.com/tonyy127/PKDiff.
PMID:41865080 | DOI:10.1038/s41598-026-44603-4