TY - JOUR
T1 - Machine learning-driven surface water quality prediction
T2 - an intuitive GUI solution for forecasting TDS and DO levels
AU - Siddiq, Bilal
AU - Javed, Muhammad Faisal
AU - Aldrees, Ali
N1 - Publisher Copyright:
© 2025 The Authors.
PY - 2025/11
Y1 - 2025/11
N2 - Industrialization and human activities significantly affect water quality, introducing pollutants into aquatic ecosystems. This study employs machine learning models, specifically two hybrid models (the random forest model and the gene expression model) and standalone models, to predict total dissolved solids (TDS) and dissolved oxygen (DO) levels from 11 inputs. Particle swarm optimization (PSO) was used to enhance model performance, with 423 samples analyzed (80% training, 20% testing). k-fold cross-validation assessed model reliability using metrics, such as R2, RMSE, and MAE. The PSO-RF model outperformed others in TDS prediction (R2 = 0.99, RMSE = 0.0001) and showed positive results for DO (R2 = 0.96, RMSE = 0.32). The PSO-GEP model also performed well (R2 = 0.99 for TDS and 0.95 for DO). Shapley analysis indicated high correlations between total solids and turbidity with TDS, and water temperature with DO. New predictive equations and a graphical user interface (GUI) were developed for practical applications.
AB - Industrialization and human activities significantly affect water quality, introducing pollutants into aquatic ecosystems. This study employs machine learning models, specifically two hybrid models (the random forest model and the gene expression model) and standalone models, to predict total dissolved solids (TDS) and dissolved oxygen (DO) levels from 11 inputs. Particle swarm optimization (PSO) was used to enhance model performance, with 423 samples analyzed (80% training, 20% testing). k-fold cross-validation assessed model reliability using metrics, such as R2, RMSE, and MAE. The PSO-RF model outperformed others in TDS prediction (R2 = 0.99, RMSE = 0.0001) and showed positive results for DO (R2 = 0.96, RMSE = 0.32). The PSO-GEP model also performed well (R2 = 0.99 for TDS and 0.95 for DO). Shapley analysis indicated high correlations between total solids and turbidity with TDS, and water temperature with DO. New predictive equations and a graphical user interface (GUI) were developed for practical applications.
KW - hybrid wavelet models
KW - k-fold cross-validation
KW - machine learning algorithms
KW - sensitivity analysis
KW - surface water quality
UR - https://www.scopus.com/pages/publications/105023579898
U2 - 10.2166/wqrj.2025.005
DO - 10.2166/wqrj.2025.005
M3 - Article
AN - SCOPUS:105023579898
SN - 2709-8044
VL - 60
SP - 514
EP - 546
JO - Water Quality Research Journal
JF - Water Quality Research Journal
IS - 4
ER -