Categories
Nevin Manimala Statistics

Data Efficiency, Dimensionality Reduction, and the Generalized Symmetric Information Bottleneck

Neural Comput. 2024 Apr 17:1-27. doi: 10.1162/neco_a_01667. Online ahead of print.

ABSTRACT

The symmetric information bottleneck (SIB), an extension of the more familiar information bottleneck, is a dimensionality-reduction technique that simultaneously compresses two random variables to preserve information between their compressed versions. We introduce the generalized symmetric information bottleneck (GSIB), which explores different functional forms of the cost of such simultaneous reduction. We then explore the data set size requirements of such simultaneous compression. We do this by deriving bounds and root-mean-squared estimates of statistical fluctuations of the involved loss functions. We show that in typical situations, the simultaneous GSIB compression requires qualitatively less data to achieve the same errors compared to compressing variables one at a time. We suggest that this is an example of a more general principle that simultaneous compression is more data efficient than independent compression of each of the input variables.

PMID:38669695 | DOI:10.1162/neco_a_01667

By Nevin Manimala

Portfolio Website for Nevin Manimala