J Child Lang. 2026 Apr 29:1-27. doi: 10.1017/S0305000926100646. Online ahead of print.
ABSTRACT
This study investigates whether child-directed speech (CDS) exhibits enhanced segmentability compared to adult-directed speech (ADS) and explores how specific linguistic properties of each register influence computational word segmentation performance in Korean. Employing a speaker-matched corpus of naturalistic Korean CDS and ADS, we observed that Korean CDS features shorter utterances and words, lower lexical diversity, fewer hapax legomena and interjections, a greater proportion of onomatopoeia and word play, a higher frequency of one-word utterances, and lower lexical ambiguity than ADS. Computational algorithms revealed significantly higher word segmentation F-scores for CDS than ADS, suggesting that child-oriented linguistic adaptations in CDS facilitate segmentation. This observation is further supported by statistical modelling, which indicates that the enhanced segmentability in CDS is modulated by the linguistic properties of the register. We discuss the nuanced roles of these properties in shaping the performance of segmentation algorithms.
PMID:42052816 | DOI:10.1017/S0305000926100646