Neural Netw. 2026 May 19;203:109153. doi: 10.1016/j.neunet.2026.109153. Online ahead of print.
ABSTRACT
In safety-critical domains ranging from medical diagnostics to credit scoring, learning algorithms for set-structured data must satisfy strict theoretical axioms: permutation invariance, monotonicity, and computational scalability. However, deep learning architectures and axiomatic aggregation theory remain poorly connected. Building on recent bounds for sum-decomposable set representations, we formalize and specialize a fundamental efficiency limitation in canonical additive architectures (Deep Sets): for exact representation of basic order-statistics with an injective sum-decomposable encoder, sum-pooling requires linear feature scaling (m=Ω(N)), rendering them dimension-inefficient for winner-take-all logic. Conversely, standard attention mechanisms violate monotonicity via competitive normalization. To bridge this gap, we introduce the Monotone Set Transformer (MoST), a verifiable neuro-fuzzy architecture. MoST utilizes a dual-prong design: (1) a learnable non-competitive gating mechanism based on positive kernels to approximate supermodular synergies, and (2) a semantic anchor based on Ordered Weighted Averaging (OWA) that enables explicit rank-dependent aggregation with O(Nlog N) complexity. We prove that MoST preserves set-inclusion monotonicity by construction. In controlled synthetic settings aligned with the theory, MoST can realize near-exact max aggregation, while supplementary experiments show strong monotonic certification and favorable time/memory scaling versus attention baselines. Across molecular tasks, results highlight a task-dependent trade-off between strict monotonic constraints and unconstrained predictive flexibility.
PMID:42184462 | DOI:10.1016/j.neunet.2026.109153