Categories
Nevin Manimala Statistics

Lost in Retraining: Closed-Loop Learning and Model Collapse in Exponential Families

Phys Rev Lett. 2026 May 15;136(19):197301. doi: 10.1103/156q-3ngc.

ABSTRACT

Closed-loop learning is the process of repeatedly estimating a model from data generated from the model itself. It is receiving great attention due to the possibility that large neural network models may, in the future, be primarily trained with data generated by artificial neural networks themselves. We study this process for models that belong to exponential families, deriving equations of motion that govern the dynamics of the parameters. We show that maximum likelihood estimation of the parameters endows sufficient statistics with the martingale property and that as a result the process converges to absorbing states that amplify initial biases present in the data. However, we show that, for exponential families, this outcome may be prevented if, at each closed-loop retraining iteration, the data contains at least one data point generated from a ground truth model, by relying on maximum a posteriori estimation or by introducing regularization.

PMID:42213932 | DOI:10.1103/156q-3ngc

By Nevin Manimala

Portfolio Website for Nevin Manimala