Proc Inst Mech Eng H. 2023 Oct 24:9544119231206456. doi: 10.1177/09544119231206456. Online ahead of print.
ABSTRACT
Missing values often affect the data utilization in epidemiological survey. In this study, according to the cut-off point value of the medical diagnostic standard of fasting blood glucose for diabetes, we divide fasting blood glucose test data from the China Health and Nutrition Survey (CHNS) of Shandong province in 2009 into two classes: the normal and the abnormal. Accordingly, for missing fasting blood glucose values, we propose a two-stage prediction filling method with optimized support vector technologies competitively by particle swarm optimization (PSO) or grey wolf optimizer (GWO), which is to first predict the class of the missing data with support vector machine (SVM) in the first stage and then predict the missing value with support vector regression (SVR) within the predicted class in the second stage. In addition, we use the LIBSVM as a gold standard to train both SVM and SVR in different stages. For two kinds of competitive optimizers in stages, in the first stage GWO has the highest classification accuracy (91.1%), and in the second stage PSO has the smallest in-class mean absolute error (0.48). So, GWO-SVM-PSO-SVR is determined as the optimal model and a predicted value with it serves as a fill value. The comparison results of the models in empirical analysis also show that it outdoes any of the other filling models in terms of mean absolute error and mean absolute percentage error. In addition, the sensitivity analysis shows that it presents high tolerance as the sample size changes and has a good stability.
PMID:37873735 | DOI:10.1177/09544119231206456