Evolutionary and ensemble machine learning predictive models for evaluation of water quality

Ali Aldrees, Muhammad Faisal Javed, Abubakr Taha Bakheit Taha, Abdeliazim Mustafa Mohamed, Michał Jasiński, Miroslava Gono

Research output: Contribution to journalArticlepeer-review

43 Scopus citations

Abstract

Study region: Bisham Qilla and Doyian stations, Indus River Basin of Pakistan Study focus: Water pollution is an international concern that impedes human health, ecological sustainability, and agricultural output. This study focuses on the distinguishing characteristics of an evolutionary and ensemble machine learning (ML) based modeling to provide an in-depth insight of escalating water quality problems. The 360 temporal readings of electric conductivity (EC) and total dissolved solids (TDS) with several input variables are used to establish multi-expression programing (MEP) model and random forest (RF) regression model for the assessment of water quality at Indus River. New hydrological insight for the region: The developed models were evaluated using several statistical metrics. The findings reveal that the determination coefficient (R2) in the testing phase (subject to unseen data) for the all the developed models is more than 0.95, indicating the accurateness of the developed models. Furthermore, the error measurements are much lesser with root mean square logarithmic error (RMSLE) nearly equals to zero for each developed model. The mean absolute percent error (MAPE) of MEP models and RF models falls below 10% and 5%, respectively, in all three phases (training, validation and testing). According to the sensitivity study of generated MEP models about the relevance of inputs on the predicted EC and TDS, shows that bi-carbonates and chlorine content have significant influence with a sensitiveness score more than 0.90, whereas the impact of sodium content is less pronounced. All the models (RF and MEP) have lower uncertainty based on the prediction interval coverage probability (PICP) calculated using the quartile regression (QR) approach. The PICP% of each model is greater than 85% in all three stages. Thus, the findings of the study indicate that developing intelligent models for water quality parameter is cost effective and feasible for monitoring and analyzing the Indus River water quality.

Original languageEnglish
Article number101331
JournalJournal of Hydrology: Regional Studies
Volume46
DOIs
StatePublished - Apr 2023

Keywords

  • Ensemble learning
  • Evolutionary algorithm
  • Random forest regression
  • Water quality assessment

Fingerprint

Dive into the research topics of 'Evolutionary and ensemble machine learning predictive models for evaluation of water quality'. Together they form a unique fingerprint.

Cite this