Explainable machine learning framework for assessing groundwater quality and trace element contamination in Eastern Saudi Arabia

Research output: Contribution to journalArticlepeer-review

Abstract

Groundwater quality in arid regions like Al Hassa, Saudi Arabia, is increasingly threatened by trace element contamination driven by human activity and natural geology. This study addresses the urgent need for data-driven tools to assess groundwater pollution in the region’s multi-aquifer system. Groundwater samples were analyzed for trace elements and main physicochemical parameters. Using supervised machine learning (ML) models—Linear Regression (LR), Random Forest (RF), K-Nearest Neighbors (KNN), and Gradient Boosting Machine (GBM)—the Water Pollution Index (WPI) was predicted as a holistic metric of contamination. The GBM model outperformed all others, achieving a training coefficient of determination (DC) of 0.9970 and a mean absolute error (MAE) of 0.0017. During testing, it maintained a high DC of 0.9372 and an MAE of 0.0063, confirming its strong generalization ability. SHapley Additive exPlanations (SHAP) were used to rank feature importance and enhance model transparency. The most influential variables for WPI prediction were chromium (Cr, SHAP = 0.0214), aluminum (Al, SHAP = 0.0136), and strontium (Sr, SHAP = 0.0053), followed by Fe (0.0031), V (0.0028), and Se (0.0017). Despite generally acceptable water quality, elements such as Cr and Fe exceeded safe limits in several samples. This study presents a transparent, high-performing framework for groundwater quality assessment in arid conditions. The integration of explainable ML offers clear, actionable insights into sustainable water management and environmental decision-making.

Original languageEnglish
Article number45333
JournalScientific Reports
Volume15
Issue number1
DOIs
StatePublished - Dec 2025

Keywords

  • Explainable machine learning
  • Groundwater quality
  • SHAP analysis
  • Trace elements
  • Water pollution index

Fingerprint

Dive into the research topics of 'Explainable machine learning framework for assessing groundwater quality and trace element contamination in Eastern Saudi Arabia'. Together they form a unique fingerprint.

Cite this