Keeping Elo alive: Evaluating and improving measurement properties of learning systems based on Elo ratings

Br J Math Stat Psychol. 2025 Jun 6. doi: 10.1111/bmsp.12395. Online ahead of print.

ABSTRACT

The Elo Rating System which originates from competitive chess has been widely utilised in large-scale online educational applications where it is used for on-the-fly estimation of ability, item calibration, and adaptivity. In this paper, we aim to critically analyse the shortcomings of the Elo rating system in an educational context, shedding light on its measurement properties and when these may fall short in accurately capturing student abilities and item difficulties. In a simulation study, we look at the asymptotic properties of the Elo rating system. Our results show that the Elo ratings are generally not unbiased and their variances are context-dependent. Furthermore, in scenarios where items are selected adaptively based on the current ratings and the item difficulties are updated alongside the student abilities, the variance of the ratings across items and students artificially increases over time and as a result the ratings do not converge. We propose a solution to this problem which entails using two parallel chains of ratings which remove the dependence of item selection on the current errors in the ratings.

PMID:40476309 | DOI:10.1111/bmsp.12395

By Nevin Manimala