Evolutionary and ensemble machine learning predictive models for evaluation of water quality

Ali Aldrees; Muhammad Faisal Javed; Abubakr Taha Bakheit Taha; Abdeliazim Mustafa Mohamed; Michał Jasiński; Miroslava Gono

doi:10.1016/j.ejrh.2023.101331

Evolutionary and ensemble machine learning predictive models for evaluation of water quality

Ali Aldrees, Muhammad Faisal Javed, Abubakr Taha Bakheit Taha, Abdeliazim Mustafa Mohamed, Michał Jasiński, Miroslava Gono

Civil Engineering

Research output: Contribution to journal › Article › peer-review

43 Scopus citations

Abstract

Study region: Bisham Qilla and Doyian stations, Indus River Basin of Pakistan Study focus: Water pollution is an international concern that impedes human health, ecological sustainability, and agricultural output. This study focuses on the distinguishing characteristics of an evolutionary and ensemble machine learning (ML) based modeling to provide an in-depth insight of escalating water quality problems. The 360 temporal readings of electric conductivity (EC) and total dissolved solids (TDS) with several input variables are used to establish multi-expression programing (MEP) model and random forest (RF) regression model for the assessment of water quality at Indus River. New hydrological insight for the region: The developed models were evaluated using several statistical metrics. The findings reveal that the determination coefficient (R2) in the testing phase (subject to unseen data) for the all the developed models is more than 0.95, indicating the accurateness of the developed models. Furthermore, the error measurements are much lesser with root mean square logarithmic error (RMSLE) nearly equals to zero for each developed model. The mean absolute percent error (MAPE) of MEP models and RF models falls below 10% and 5%, respectively, in all three phases (training, validation and testing). According to the sensitivity study of generated MEP models about the relevance of inputs on the predicted EC and TDS, shows that bi-carbonates and chlorine content have significant influence with a sensitiveness score more than 0.90, whereas the impact of sodium content is less pronounced. All the models (RF and MEP) have lower uncertainty based on the prediction interval coverage probability (PICP) calculated using the quartile regression (QR) approach. The PICP% of each model is greater than 85% in all three stages. Thus, the findings of the study indicate that developing intelligent models for water quality parameter is cost effective and feasible for monitoring and analyzing the Indus River water quality.

Original language	English
Article number	101331
Journal	Journal of Hydrology: Regional Studies
Volume	46
DOIs	https://doi.org/10.1016/j.ejrh.2023.101331
State	Published - Apr 2023

Keywords

Ensemble learning
Evolutionary algorithm
Random forest regression
Water quality assessment

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1016/j.ejrh.2023.101331

Cite this

@article{aafc2d1fb3f848a9acb0f1e9e1ffaaf7,

title = "Evolutionary and ensemble machine learning predictive models for evaluation of water quality",

abstract = "Study region: Bisham Qilla and Doyian stations, Indus River Basin of Pakistan Study focus: Water pollution is an international concern that impedes human health, ecological sustainability, and agricultural output. This study focuses on the distinguishing characteristics of an evolutionary and ensemble machine learning (ML) based modeling to provide an in-depth insight of escalating water quality problems. The 360 temporal readings of electric conductivity (EC) and total dissolved solids (TDS) with several input variables are used to establish multi-expression programing (MEP) model and random forest (RF) regression model for the assessment of water quality at Indus River. New hydrological insight for the region: The developed models were evaluated using several statistical metrics. The findings reveal that the determination coefficient (R2) in the testing phase (subject to unseen data) for the all the developed models is more than 0.95, indicating the accurateness of the developed models. Furthermore, the error measurements are much lesser with root mean square logarithmic error (RMSLE) nearly equals to zero for each developed model. The mean absolute percent error (MAPE) of MEP models and RF models falls below 10\% and 5\%, respectively, in all three phases (training, validation and testing). According to the sensitivity study of generated MEP models about the relevance of inputs on the predicted EC and TDS, shows that bi-carbonates and chlorine content have significant influence with a sensitiveness score more than 0.90, whereas the impact of sodium content is less pronounced. All the models (RF and MEP) have lower uncertainty based on the prediction interval coverage probability (PICP) calculated using the quartile regression (QR) approach. The PICP\% of each model is greater than 85\% in all three stages. Thus, the findings of the study indicate that developing intelligent models for water quality parameter is cost effective and feasible for monitoring and analyzing the Indus River water quality.",

keywords = "Ensemble learning, Evolutionary algorithm, Random forest regression, Water quality assessment",

author = "Ali Aldrees and Javed, \{Muhammad Faisal\} and \{Bakheit Taha\}, \{Abubakr Taha\} and \{Mustafa Mohamed\}, Abdeliazim and Micha{\l} Jasi{\'n}ski and Miroslava Gono",

note = "Publisher Copyright: {\textcopyright} 2023 The Authors",

year = "2023",

month = apr,

doi = "10.1016/j.ejrh.2023.101331",

language = "English",

volume = "46",

journal = "Journal of Hydrology: Regional Studies",

issn = "2214-5818",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Evolutionary and ensemble machine learning predictive models for evaluation of water quality

AU - Aldrees, Ali

AU - Javed, Muhammad Faisal

AU - Bakheit Taha, Abubakr Taha

AU - Mustafa Mohamed, Abdeliazim

AU - Jasiński, Michał

AU - Gono, Miroslava

PY - 2023/4

Y1 - 2023/4

N2 - Study region: Bisham Qilla and Doyian stations, Indus River Basin of Pakistan Study focus: Water pollution is an international concern that impedes human health, ecological sustainability, and agricultural output. This study focuses on the distinguishing characteristics of an evolutionary and ensemble machine learning (ML) based modeling to provide an in-depth insight of escalating water quality problems. The 360 temporal readings of electric conductivity (EC) and total dissolved solids (TDS) with several input variables are used to establish multi-expression programing (MEP) model and random forest (RF) regression model for the assessment of water quality at Indus River. New hydrological insight for the region: The developed models were evaluated using several statistical metrics. The findings reveal that the determination coefficient (R2) in the testing phase (subject to unseen data) for the all the developed models is more than 0.95, indicating the accurateness of the developed models. Furthermore, the error measurements are much lesser with root mean square logarithmic error (RMSLE) nearly equals to zero for each developed model. The mean absolute percent error (MAPE) of MEP models and RF models falls below 10% and 5%, respectively, in all three phases (training, validation and testing). According to the sensitivity study of generated MEP models about the relevance of inputs on the predicted EC and TDS, shows that bi-carbonates and chlorine content have significant influence with a sensitiveness score more than 0.90, whereas the impact of sodium content is less pronounced. All the models (RF and MEP) have lower uncertainty based on the prediction interval coverage probability (PICP) calculated using the quartile regression (QR) approach. The PICP% of each model is greater than 85% in all three stages. Thus, the findings of the study indicate that developing intelligent models for water quality parameter is cost effective and feasible for monitoring and analyzing the Indus River water quality.

AB - Study region: Bisham Qilla and Doyian stations, Indus River Basin of Pakistan Study focus: Water pollution is an international concern that impedes human health, ecological sustainability, and agricultural output. This study focuses on the distinguishing characteristics of an evolutionary and ensemble machine learning (ML) based modeling to provide an in-depth insight of escalating water quality problems. The 360 temporal readings of electric conductivity (EC) and total dissolved solids (TDS) with several input variables are used to establish multi-expression programing (MEP) model and random forest (RF) regression model for the assessment of water quality at Indus River. New hydrological insight for the region: The developed models were evaluated using several statistical metrics. The findings reveal that the determination coefficient (R2) in the testing phase (subject to unseen data) for the all the developed models is more than 0.95, indicating the accurateness of the developed models. Furthermore, the error measurements are much lesser with root mean square logarithmic error (RMSLE) nearly equals to zero for each developed model. The mean absolute percent error (MAPE) of MEP models and RF models falls below 10% and 5%, respectively, in all three phases (training, validation and testing). According to the sensitivity study of generated MEP models about the relevance of inputs on the predicted EC and TDS, shows that bi-carbonates and chlorine content have significant influence with a sensitiveness score more than 0.90, whereas the impact of sodium content is less pronounced. All the models (RF and MEP) have lower uncertainty based on the prediction interval coverage probability (PICP) calculated using the quartile regression (QR) approach. The PICP% of each model is greater than 85% in all three stages. Thus, the findings of the study indicate that developing intelligent models for water quality parameter is cost effective and feasible for monitoring and analyzing the Indus River water quality.

KW - Ensemble learning

KW - Evolutionary algorithm

KW - Random forest regression

KW - Water quality assessment

UR - http://www.scopus.com/inward/record.url?scp=85149066333&partnerID=8YFLogxK

U2 - 10.1016/j.ejrh.2023.101331

DO - 10.1016/j.ejrh.2023.101331

M3 - Article

AN - SCOPUS:85149066333

SN - 2214-5818

VL - 46

JO - Journal of Hydrology: Regional Studies

JF - Journal of Hydrology: Regional Studies

M1 - 101331

ER -

Evolutionary and ensemble machine learning predictive models for evaluation of water quality

Abstract

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this