Categories
Nevin Manimala Statistics

Assessing imputation techniques for missing data in small and multicollinear datasets: insights from craniofacial morphometry

BMC Med Res Methodol. 2026 Feb 4. doi: 10.1186/s12874-025-02762-4. Online ahead of print.

ABSTRACT

BACKGROUND: Analyses of craniofacial morphology are essential for various medical and research applications, including the study of midfacial development, dysmorphologies, and planning surgical interventions. Incomplete CT scans often due to patient movement, imaging artifacts, or obscured landmarks which can result in missing data. If not properly addressed, such missingness may bias conclusions and weaken statistical power.

OBJECTIVE: This paper evaluates imputation techniques to identify the most suitable method for handling missing completely at random values in small, high-dimensional, and highly correlated craniofacial morphometric datasets.

METHODS: 42 craniofacial variables were measured from 32 observations. The missing data structure was set to be at random with 268 (20%) missing values. Five common imputation techniques namely Mean/Median imputation, k-Nearest Neighbors (kNN), Multiple Imputation by Chained Equations (MICE), Random Forest (RF), and Decision Tree, were considered. The performance of the imputation technique was quantified using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Variance Preservation.

RESULTS: RF Imputation demonstrated the best overall performance, with the lowest RMSE (1.3987) and MAE (0.4902), indicating a high level of accuracy in imputing missing values. It also maintained a relatively close to 1 variance preservation (0.8961), suggesting its effectiveness in retaining the original variability in the dataset. MICE present lower accuracy with high RMSE (3.0869) and MAE (1.1246) however appear to have the closest variance preservation to 1 (1.0580).

CONCLUSION: The findings emphasize the importance of choosing suitable imputation techniques for small, high-dimensional, and correlated datasets such as those in craniofacial morphometry. RF emerged as the most effective method, offering a strong balance between accuracy and variance preservation.

PMID:41639629 | DOI:10.1186/s12874-025-02762-4

By Nevin Manimala

Portfolio Website for Nevin Manimala