J Chem Inf Model. 2026 Jun 11. doi: 10.1021/acs.jcim.6c00831. Online ahead of print.
ABSTRACT
A unified atlas of 11073 noncovalent complexes spanning quantum-chemical benchmark dimers, deep eutectic solvent formulations, and large supramolecular host-guest assemblies is constructed using the Kulkarni Universal Interaction Descriptor (K-UID), a hierarchically structured hexadecimal addressing system derived from the nine-dimensional Kulkarni-NCI Fingerprint. Discretization of the globally normalized KNF feature space yields 1119 distinct topological families, of which 754 contain two or more members with a median within-family coefficient of variation of 6.93% for the Noncovalent Interaction Score (SNCI), establishing that topological address membership exhibits statistically consistent interaction descriptor behavior across all four source datasets. A controlled row-shuffling ablation confirms that the KNF feature vector encodes genuine, molecule-specific physical information: random permutation of feature vectors across complexes increases mean absolute error by a factor of 2.8 and drives the coefficient of determination to R2 = -1.251, establishing that predictive performance derives from the physical correspondence between features and molecular interaction fields rather than from exploitable distributional properties of the descriptor. Principal component analysis of the KNF feature space reveals a striking asymmetry in topological coverage: the 27-complex S30L supramolecular dataset occupies 39.3% of the accessible PC1-PC2 interaction manifold, a per-compound coverage 652 times greater than NENCI-2021 dimers, with 91.3% of S30L topological families having no analog in the reference benchmarks. Nearest-neighbor matching of deep eutectic solvent and supramolecular complexes to reference-dataset physics twins in the scale-invariant intensive feature space yields a Spearman rank correlation of ρ = 0.283 (p = 1.16 × 10-52, n = 2789), confirming transferable rank information across molecular scales governed by interaction topology without structural similarity constraints. Family occupancy analysis further reveals that 66.9% of reference topological families (401 of 599) have no experimentally realized DES analog in the current atlas, while 499 DES-exclusive families lie outside the reference benchmark space, establishing that current DES design practice explores a confined, nonrepresentative region of the accessible interaction manifold. Together, these findings establish interaction topology as a transferable classification property and provide a concrete, atlas-defined route to identifying underexplored design space in functional noncovalent materials.
PMID:42275604 | DOI:10.1021/acs.jcim.6c00831