TY - JOUR
T1 - Evolutionary and ensemble machine learning predictive models for evaluation of water quality
AU - Aldrees, Ali
AU - Javed, Muhammad Faisal
AU - Bakheit Taha, Abubakr Taha
AU - Mustafa Mohamed, Abdeliazim
AU - Jasiński, Michał
AU - Gono, Miroslava
N1 - Publisher Copyright:
© 2023 The Authors
PY - 2023/4
Y1 - 2023/4
N2 - Study region: Bisham Qilla and Doyian stations, Indus River Basin of Pakistan Study focus: Water pollution is an international concern that impedes human health, ecological sustainability, and agricultural output. This study focuses on the distinguishing characteristics of an evolutionary and ensemble machine learning (ML) based modeling to provide an in-depth insight of escalating water quality problems. The 360 temporal readings of electric conductivity (EC) and total dissolved solids (TDS) with several input variables are used to establish multi-expression programing (MEP) model and random forest (RF) regression model for the assessment of water quality at Indus River. New hydrological insight for the region: The developed models were evaluated using several statistical metrics. The findings reveal that the determination coefficient (R2) in the testing phase (subject to unseen data) for the all the developed models is more than 0.95, indicating the accurateness of the developed models. Furthermore, the error measurements are much lesser with root mean square logarithmic error (RMSLE) nearly equals to zero for each developed model. The mean absolute percent error (MAPE) of MEP models and RF models falls below 10% and 5%, respectively, in all three phases (training, validation and testing). According to the sensitivity study of generated MEP models about the relevance of inputs on the predicted EC and TDS, shows that bi-carbonates and chlorine content have significant influence with a sensitiveness score more than 0.90, whereas the impact of sodium content is less pronounced. All the models (RF and MEP) have lower uncertainty based on the prediction interval coverage probability (PICP) calculated using the quartile regression (QR) approach. The PICP% of each model is greater than 85% in all three stages. Thus, the findings of the study indicate that developing intelligent models for water quality parameter is cost effective and feasible for monitoring and analyzing the Indus River water quality.
AB - Study region: Bisham Qilla and Doyian stations, Indus River Basin of Pakistan Study focus: Water pollution is an international concern that impedes human health, ecological sustainability, and agricultural output. This study focuses on the distinguishing characteristics of an evolutionary and ensemble machine learning (ML) based modeling to provide an in-depth insight of escalating water quality problems. The 360 temporal readings of electric conductivity (EC) and total dissolved solids (TDS) with several input variables are used to establish multi-expression programing (MEP) model and random forest (RF) regression model for the assessment of water quality at Indus River. New hydrological insight for the region: The developed models were evaluated using several statistical metrics. The findings reveal that the determination coefficient (R2) in the testing phase (subject to unseen data) for the all the developed models is more than 0.95, indicating the accurateness of the developed models. Furthermore, the error measurements are much lesser with root mean square logarithmic error (RMSLE) nearly equals to zero for each developed model. The mean absolute percent error (MAPE) of MEP models and RF models falls below 10% and 5%, respectively, in all three phases (training, validation and testing). According to the sensitivity study of generated MEP models about the relevance of inputs on the predicted EC and TDS, shows that bi-carbonates and chlorine content have significant influence with a sensitiveness score more than 0.90, whereas the impact of sodium content is less pronounced. All the models (RF and MEP) have lower uncertainty based on the prediction interval coverage probability (PICP) calculated using the quartile regression (QR) approach. The PICP% of each model is greater than 85% in all three stages. Thus, the findings of the study indicate that developing intelligent models for water quality parameter is cost effective and feasible for monitoring and analyzing the Indus River water quality.
KW - Ensemble learning
KW - Evolutionary algorithm
KW - Random forest regression
KW - Water quality assessment
UR - http://www.scopus.com/inward/record.url?scp=85149066333&partnerID=8YFLogxK
U2 - 10.1016/j.ejrh.2023.101331
DO - 10.1016/j.ejrh.2023.101331
M3 - Article
AN - SCOPUS:85149066333
SN - 2214-5818
VL - 46
JO - Journal of Hydrology: Regional Studies
JF - Journal of Hydrology: Regional Studies
M1 - 101331
ER -