TY - JOUR
T1 - Optimized Ensemble Methods for Classifying Imbalanced Water Quality Index Data
AU - Karami Lawal, Zaharaddeen
AU - Aldrees, Ali
AU - Yassin, Hayati
AU - Dan'Azumi, Salisu
AU - Raghavendra Naganna, Sujay
AU - Abba, Sani I.
AU - Sammen, Saad Sh
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024
Y1 - 2024
N2 - River water pollution has increased due to human activities. Initially, numerical and analytical methods were used to classify river water quality, but machine learning now enables faster and more accurate water quality index (WQI) classification. This study aimed to develop an effective ensemble model for classifying river water as drinkable or polluted using advanced machine learning. The objective was to apply a classification method to predict WQI using Kinta River data in Malaysia and improve on existing models' 70-95% accuracy range. The dataset of this study comprises 301 records collected from eight monitoring stations along the Kinta River, encompassing 31 pollution indicators, including hydrological, chemical, physical, and microbiological parameters. Six algorithms used include decision tree, logistic regression, random forest, support vector machine, AdaBoost, and XGBoost. The three experiments were conducted with and without hyperparameter tuning. The dataset was normalized and oversampled to address the imbalance. In all experiments, XGBoost performed best individually, while SVM was worst. The ensemble models outperformed individuals, with the GridSearchCV ensemble achieving 97.3% accuracy, an improvement exceeding the existing literature's models by 2.3%. The study had limitations, such as the absence of advanced optimization or dimensionality reduction. In conclusion, it demonstrated that an ensemble model with optimized hyperparameters could classify river water quality more effectively than individual models, contributing to the advancement of sustainable development goals (SGD) related to water access.
AB - River water pollution has increased due to human activities. Initially, numerical and analytical methods were used to classify river water quality, but machine learning now enables faster and more accurate water quality index (WQI) classification. This study aimed to develop an effective ensemble model for classifying river water as drinkable or polluted using advanced machine learning. The objective was to apply a classification method to predict WQI using Kinta River data in Malaysia and improve on existing models' 70-95% accuracy range. The dataset of this study comprises 301 records collected from eight monitoring stations along the Kinta River, encompassing 31 pollution indicators, including hydrological, chemical, physical, and microbiological parameters. Six algorithms used include decision tree, logistic regression, random forest, support vector machine, AdaBoost, and XGBoost. The three experiments were conducted with and without hyperparameter tuning. The dataset was normalized and oversampled to address the imbalance. In all experiments, XGBoost performed best individually, while SVM was worst. The ensemble models outperformed individuals, with the GridSearchCV ensemble achieving 97.3% accuracy, an improvement exceeding the existing literature's models by 2.3%. The study had limitations, such as the absence of advanced optimization or dimensionality reduction. In conclusion, it demonstrated that an ensemble model with optimized hyperparameters could classify river water quality more effectively than individual models, contributing to the advancement of sustainable development goals (SGD) related to water access.
KW - Artificial intelligence
KW - machine learning
KW - pollution
KW - water quality modelling
UR - http://www.scopus.com/inward/record.url?scp=85209988078&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2024.3502361
DO - 10.1109/ACCESS.2024.3502361
M3 - Article
AN - SCOPUS:85209988078
SN - 2169-3536
VL - 12
SP - 178536
EP - 178551
JO - IEEE Access
JF - IEEE Access
ER -