JMIR Form Res. 2023 Jan 13;7:e42792. doi: 10.2196/42792.
BACKGROUND: The rising number of patients with dementia has become a serious social problem worldwide. To help detect dementia at an early stage, many studies have been conducted to detect signs of cognitive decline by prosodic and acoustic features. However, many of these methods are not suitable for everyday use as they focus on cognitive function or conversational speech during the examinations. In contrast, conversational humanoid robots are expected to be used in the care of older people to help reduce the work of care and monitoring through interaction.
OBJECTIVE: This study focuses on early detection of mild cognitive impairment (MCI) through conversations between patients and humanoid robots without a specific examination, such as neuropsychological examination.
METHODS: This was an exploratory study involving patients with MCI and cognitively normal (CN) older people. We collected the conversation data during neuropsychological examination (Mini-Mental State Examination [MMSE]) and everyday conversation between a humanoid robot and 94 participants (n=47, 50%, patients with MCI and n=47, 50%, CN older people). We extracted 17 types of prosodic and acoustic features, such as the duration of response time and jitter, from these conversations. We conducted a statistical significance test for each feature to clarify the speech features that are useful when classifying people into CN people and patients with MCI. Furthermore, we conducted an automatic classification experiment using a support vector machine (SVM) to verify whether it is possible to automatically classify these 2 groups by the features identified in the statistical significance test.
RESULTS: We obtained significant differences in 5 (29%) of 17 types of features obtained from the MMSE conversational speech. The duration of response time, the duration of silent periods, and the proportion of silent periods showed a significant difference (P<.001) and met the reference value r=0.1 (small) of the effect size. Additionally, filler periods (P<.01) and the proportion of fillers (P=.02) showed a significant difference; however, these did not meet the reference value of the effect size. In contrast, we obtained significant differences in 16 (94%) of 17 types of features obtained from the everyday conversations with the humanoid robot. The duration of response time, the duration of speech periods, jitter (local, relative average perturbation [rap], 5-point period perturbation quotient [ppq5], difference of difference of periods [ddp]), shimmer (local, amplitude perturbation quotient [apq]3, apq5, apq11, average absolute differences between the amplitudes of consecutive periods [dda]), and F0cov (coefficient of variation of the fundamental frequency) showed a significant difference (P<.001). In addition, the duration of response time, the duration of silent periods, the filler period, and the proportion of fillers showed significant differences (P<.05). However, only jitter (local) met the reference value r=0.1 (small) of the effect size. In the automatic classification experiment for the classification of participants into CN and MCI groups, the results showed 66.0% accuracy in the MMSE conversational speech and 68.1% accuracy in everyday conversations with the humanoid robot.
CONCLUSIONS: This study shows the possibility of early and simple screening for patients with MCI using prosodic and acoustic features from everyday conversations with a humanoid robot with the same level of accuracy as the MMSE.