Arch Public Health. 2026 May 14. doi: 10.1186/s13690-026-01958-4. Online ahead of print.
ABSTRACT
BACKGROUND: Extensive population testing played a crucial role in mitigating the COVID-19 pandemic. However, scaling up testing capacity requires a considerable workforce and infrastructure. Furthermore, sampling and testing delays can hinder timely interventions. We therefore sought to improve pre-test triage through an ensemble model based on self-reported information.
METHODS: We trained an XGBoost classifier to predict individual risk of COVID-19 infection for higher education students in Leuven (Belgium) from real-world social and health data related to 38,180 test results. The model could recommend isolation, testing, or release of individuals at high, moderate, or low risk of infection, respectively, based on two parametrizable probability thresholds. We then studied the epidemiological impact of the ensemble triage tool in silico, by simulating its implementation in our context to control an epidemic over time.
RESULTS: The predictive model achieved a ROC AUC of [Formula: see text], but its performance varied across rolling retraining windows. The epidemiological simulations highlight the potential of the ensemble-enhanced triage system to control a surge of infections in the student population of Leuven. Given a rapid implementation at the onset of an infection surge, it could reduce the effective reproduction number below 1.0 while reducing the testing requirements by [Formula: see text]. The predictions of the ensemble model were strongly influenced by the number of contacts which individuals reported, the reason for testing, and the onset of symptoms.
CONCLUSIONS: Our study suggests that pre-test triage guided by ensemble models could play an important role in allocating testing resources efficiently. Given timely implementation and isolation compliance within the population, it could also help rapidly control a surge of infections. Future research could validate this approach for other pathogens, in other settings, and with deep learning models.
PMID:42135808 | DOI:10.1186/s13690-026-01958-4