TY - JOUR
T1 - Enhancing Air Quality Index Classification Based on Ensemble Machine Learning Techniques
AU - MAHMOUD ABUALEALA, AHMED
AU - Osman, Ahmed M.
AU - Tarek, Zahraa
AU - Elshewey, Ahmed M.
N1 - Publisher Copyright:
© (2025), (Dr D. Pylarinos). All rights reserved.
PY - 2025/12/8
Y1 - 2025/12/8
N2 - The accurate classification of Air Quality Index (AQI) is critical for environmental monitoring and public health protection. In this paper, we utilized a publicly available daily air quality dataset from U.S. counties, comprising six classification categories: Good, Moderate, Unhealthy for Sensitive Groups, Unhealthy, Very Unhealthy, and Hazardous. The dataset underwent preprocessing through missing value imputation and class balancing using the Synthetic Minority Over-sampling Technique (SMOTE). Several machine learning and deep learning models were trained and evaluated on the dataset, including Random Forest (RF), Extra Trees (ET), K-Nearest Neighbors (KNN), Naive Bayes (NB), Logistic Regression (LR), and a Multi-Layer Perceptron (MLP) neural network. The models were assessed using cross-validation accuracy, test set accuracy, macro-averaged recall, F1-Score, and ROC-AUC metrics. Ensemble methods (RRF and ET) and the MLP classifier achieved superior results compared to traditional models. The RF model achieved a test accuracy of 99.3%, while the MLP classifier achieved 99.0%. The stacking ensemble model achieved a test accuracy of 99.99 %, a macro-averaged recall of 87.12 %, and an ROC-AUC of 1.0000, highlighting the strong potential of ensemble learning techniques in enhancing the performance of AQI multi-class classification.
AB - The accurate classification of Air Quality Index (AQI) is critical for environmental monitoring and public health protection. In this paper, we utilized a publicly available daily air quality dataset from U.S. counties, comprising six classification categories: Good, Moderate, Unhealthy for Sensitive Groups, Unhealthy, Very Unhealthy, and Hazardous. The dataset underwent preprocessing through missing value imputation and class balancing using the Synthetic Minority Over-sampling Technique (SMOTE). Several machine learning and deep learning models were trained and evaluated on the dataset, including Random Forest (RF), Extra Trees (ET), K-Nearest Neighbors (KNN), Naive Bayes (NB), Logistic Regression (LR), and a Multi-Layer Perceptron (MLP) neural network. The models were assessed using cross-validation accuracy, test set accuracy, macro-averaged recall, F1-Score, and ROC-AUC metrics. Ensemble methods (RRF and ET) and the MLP classifier achieved superior results compared to traditional models. The RF model achieved a test accuracy of 99.3%, while the MLP classifier achieved 99.0%. The stacking ensemble model achieved a test accuracy of 99.99 %, a macro-averaged recall of 87.12 %, and an ROC-AUC of 1.0000, highlighting the strong potential of ensemble learning techniques in enhancing the performance of AQI multi-class classification.
KW - air pollution
KW - air quality classification
KW - Air Quality Index (AQI)
KW - ensemble machine learning
KW - environmental monitoring
KW - machine learning
UR - https://www.scopus.com/pages/publications/105027317781
U2 - 10.48084/etasr.13875
DO - 10.48084/etasr.13875
M3 - Article
AN - SCOPUS:105027317781
SN - 2241-4487
VL - 15
SP - 29325
EP - 29333
JO - Engineering, Technology and Applied Science Research
JF - Engineering, Technology and Applied Science Research
IS - 6
ER -