Environ Sci Pollut Res Int. 2026 May 26. doi: 10.1007/s11356-026-37867-w. Online ahead of print.
ABSTRACT
Globally, the primary concern affecting the suitability of groundwater for drinking is the presence of numerous chemical contaminants in large-scale aquifer systems. Therefore, it is essential to establish reliable methods for assessing groundwater quality and determining the origin of groundwater contaminants. This study developed a comprehensive, data-driven method for evaluating the quality of large-scale groundwater in the State of Bihar, India, using a traditional Water Quality Index (WQI) and a statistically based Root Mean Square Water Quality Index (RMS-WQI). In the present study, four state-of-the-art machine learning algorithms, namely, Classification and Regression Tree (CART), Light Gradient Boosting Model (LGBM), Random Forest (RF), and Extreme Gradient Boosting Model (XGBoost), were evaluated to assess their utility in predicting groundwater quality. Of the four models tested, XGBoost demonstrated the highest degree of predictive performance, exhibiting high levels of accuracy in terms of R2 values of 0.984 for the WQI and 0.994 for the RMS-WQI and low error metrics. Spatial diagnostics of the RMS-WQI model employing the Nash-Sutcliffe Efficiency (NSE), Model Efficiency Factor (MEF), and Percent Relative Error Index (PREI) identified heterogeneity in model performance, particularly in the data-volatile Gaya District, where NSE = -0.1. The uncertainty and robustness of the ML model were thoroughly evaluated using Monte Carlo simulations, which showed a reliability of 88.5%. Geochemical analysis indicated that both natural geochemical and anthropogenically influenced processes contributed to the variability in groundwater chemistry. Four main contributors to groundwater chemistry were identified through absolute principal component scores-multiple linear regression (APCS-MLR): mineral dissolution (32.7%), water-rock interactions (20.1%), mixed sources (16.3%), and anthropogenic inputs (13.2%). This innovative integrated methodology provides a scalable and cost- effective decision-making tool for predicting the spatial distribution of groundwater quality and supports the development of sustainable hydro-environmental management practices, while also supporting the achievement of United Nations Sustainable Development Goal 6.
PMID:42189468 | DOI:10.1007/s11356-026-37867-w