Categories
Nevin Manimala Statistics

An explainable and transparent machine learning approach for predicting dental caries: a cross-national validation study

BMC Oral Health. 2026 Jan 17. doi: 10.1186/s12903-026-07660-9. Online ahead of print.

ABSTRACT

BACKGROUND: There has been a notable increase in artificial intelligence (AI) studies in dentistry. However, the inadequate use of proper validation methods has led to overly optimistic performance metrics of machine learning (ML) models. External validation provides evidence of a ML model’s performance with independent datasets and is crucial for generalizability.

METHODS: We developed Extreme Gradient Boosting (XGBoost) models to detect dental caries using easy-to-collect questionnaire data. ML model training was conducted using cross-validation nested resampling with a holdout test set, utilizing NHANES datasets (n = 6070). Performance of the trained model was tested using external data from the Northern Finland Birth Cohorts (NFBC1966 and NFBC1986; n = 3616). To enhance interpretability, beeswarm plots were constructed to visualize variable importance.

RESULTS: The ML model demonstrated acceptable performance in predicting dental caries on the internal dataset, with an area under the operating characteristics curve (AUC) of 0.785 (95% CI 0.756-0.813). However, the model encountered difficulties in identifying participants with dental caries, as shown by its poor sensitivity of 0.391, despite achieving a high specificity of 0.919. When applied to the external dataset, the ML model encountered significant challenges, with the AUC dropping to 0.550 (95% CI 0.532-0.569), sensitivity decreasing to 0.053, and specificity slightly improving to 0.974. Important variables identified by the model were self-rated condition of teeth and gums, presence of missing teeth, financial status, and time since last dental visit.

CONCLUSION: The performance of our ML model during external validation degraded notably compared to the internal validation. However, the XAI methodology exhibited great potential to be used in the future for individualized dental caries risk assessment.

PMID:41546040 | DOI:10.1186/s12903-026-07660-9

By Nevin Manimala

Portfolio Website for Nevin Manimala