TY - JOUR
T1 - Environmental assessment based surface water quality prediction using hyper-parameter optimized machine learning models based on consistent big data
AU - Shah, Muhammad Izhar
AU - Javed, Muhammad Faisal
AU - Alqahtani, Abdulaziz
AU - Aldrees, Ali
N1 - Publisher Copyright:
© 2021 Institution of Chemical Engineers
PY - 2021/7
Y1 - 2021/7
N2 - Prediction of dissolved oxygen (DO) and total dissolved solids (TDS) are of paramount importance for water environmental protection and analysis of the ecosystem. The traditional methods for water quality prediction are suffering from unadjusted hyper-parameters. To effectively solve the hyper-parameter setting problem, the present study proposes a framework for tuning the hyper-parameters of feed forward neural network (FFNN) and gene expression programming (GEP) with particle swarm optimization (PSO). Thereafter, the PSO coupled hybrid feed forward neural network (PSO-FFNN) and hybrid gene expression programming (PSO-GEP) were used to predict DO and TDS levels in the upper Indus River. Based on thirty years consistent dataset, the most influential input parameters for DO and TDS prediction were determined using principal component analysis (PCA). The impact on the model performance was evaluated employing five statistical evaluation techniques. Modeling results indicated excellent searching efficiency of the PSO algorithm in optimizing the structure and hyper-parameters of the FFNN and GEP. Results of PCA revealed that magnesium, chloride, sulphate, bicarbonates, specific conductivity, and water temperature are appropriate inputs for DO modeling, whereas; calcium, magnesium, sodium, chloride, bicarbonates and specific conductivity remained the influential parameters for TDS. Both the proposed hybrid models showed better accuracy in predicting DO and TDS, however, the hybrid PSO-GEP model achieves better accuracy than the PSO-FFNN with R value above 0.85, the root mean squared error (RMSE) below 3 mg/l and performance index value close to 1. The external validation criteria confirmed the resolved overfitting issue and generalized results of the models. Cross-validation of the model output attained the best statistical metrics i.e. (R = 0.87, RMSE = 2.67) and (R = 0.895, RMSE = 2.21) for PSO-FFNN and PSO-GEP model, respectively. The research findings demonstrated that the implementation of artificial intelligence models with optimization routine can lead to optimized models for accurate prediction of water quality.
AB - Prediction of dissolved oxygen (DO) and total dissolved solids (TDS) are of paramount importance for water environmental protection and analysis of the ecosystem. The traditional methods for water quality prediction are suffering from unadjusted hyper-parameters. To effectively solve the hyper-parameter setting problem, the present study proposes a framework for tuning the hyper-parameters of feed forward neural network (FFNN) and gene expression programming (GEP) with particle swarm optimization (PSO). Thereafter, the PSO coupled hybrid feed forward neural network (PSO-FFNN) and hybrid gene expression programming (PSO-GEP) were used to predict DO and TDS levels in the upper Indus River. Based on thirty years consistent dataset, the most influential input parameters for DO and TDS prediction were determined using principal component analysis (PCA). The impact on the model performance was evaluated employing five statistical evaluation techniques. Modeling results indicated excellent searching efficiency of the PSO algorithm in optimizing the structure and hyper-parameters of the FFNN and GEP. Results of PCA revealed that magnesium, chloride, sulphate, bicarbonates, specific conductivity, and water temperature are appropriate inputs for DO modeling, whereas; calcium, magnesium, sodium, chloride, bicarbonates and specific conductivity remained the influential parameters for TDS. Both the proposed hybrid models showed better accuracy in predicting DO and TDS, however, the hybrid PSO-GEP model achieves better accuracy than the PSO-FFNN with R value above 0.85, the root mean squared error (RMSE) below 3 mg/l and performance index value close to 1. The external validation criteria confirmed the resolved overfitting issue and generalized results of the models. Cross-validation of the model output attained the best statistical metrics i.e. (R = 0.87, RMSE = 2.67) and (R = 0.895, RMSE = 2.21) for PSO-FFNN and PSO-GEP model, respectively. The research findings demonstrated that the implementation of artificial intelligence models with optimization routine can lead to optimized models for accurate prediction of water quality.
KW - Cross-validation
KW - Environmental protection
KW - Machine learning modeling
KW - Particle swarm optimization
KW - Principal component analysis
KW - River water quality
UR - https://www.scopus.com/pages/publications/85106866987
U2 - 10.1016/j.psep.2021.05.026
DO - 10.1016/j.psep.2021.05.026
M3 - Article
AN - SCOPUS:85106866987
SN - 0957-5820
VL - 151
SP - 324
EP - 340
JO - Process Safety and Environmental Protection
JF - Process Safety and Environmental Protection
ER -