A Statistical Learning-Based Clustering Model With Features Selection to Identify Dyslexia in School-Aged Children

Dyslexia. 2025 Nov;31(4):e70013. doi: 10.1002/dys.70013.

ABSTRACT

The multi-deficit framework employed to identify dyslexia requires statistical learning-based models to account for the complex interplay of cognitive skills. Traditional methods often rely on simplistic statistical techniques, which may fail to capture the heterogeneity inherent in dyslexia. This study introduces a model-based clustering framework, employing finite mixtures of contaminated Gaussian distributions, to better understand and classify dyslexia. Using data from a cohort of 122 children in Poland, including 51 diagnosed with dyslexia, we explore the effectiveness of this method in distinguishing between dyslexic and control groups. Our approach integrates variable selection techniques to identify clinically relevant cognitive skills while addressing issues of outliers and redundant variables. Results demonstrate the superiority of multivariate finite mixture models, achieving high accuracy in clustering and revealing the importance of specific variables such as Reading, Phonology, and Rapid Automatized Naming. This study emphasises the value of the multiple-deficit model and robust statistical techniques in advancing the diagnosis and understanding of dyslexia.

PMID:40954446 | DOI:10.1002/dys.70013

By Nevin Manimala