Categories
Nevin Manimala Statistics

Identification of sentence stems characteristic of Chinese learner English writing

Heliyon. 2024 Aug 30;11(3):e37166. doi: 10.1016/j.heliyon.2024.e37166. eCollection 2025 Feb 15.

ABSTRACT

Phraseological units in academic English texts have been a central focus in recent corpus linguistic research. This paper describes a special category of clause-level phraseological units, namely, Characteristic Sentence Stems (CSSs), with a view to describing their identifying criteria and their extraction method. CSSs are contiguous lexico-grammatical sequences which contain a subject-predicate structure and which are frame expressions characteristic of academic writing. The extraction method of a CSS consists of six steps: POS tagging, n-gram segmentation, structure identification, significance of occurrence calculation, text range calculation, and overlapping sequence reduction. The significance of occurrence calculation is the crux of this method. It includes the computing of both the internal association and the boundary independence of a CSS, and it tests the occurring significance of the CSS from both the inside and the outside perspectives. Our methods and results suggest that CSSs can be statistically defined and extracted from corpora and can employed in large-scale studies to more fully account for the phraseological features of non-native English academic writing.

PMID:40196792 | PMC:PMC11947701 | DOI:10.1016/j.heliyon.2024.e37166

By Nevin Manimala

Portfolio Website for Nevin Manimala