Enhancing Air Quality Index Classification Based on Ensemble Machine Learning Techniques

Research output: Contribution to journalArticlepeer-review

Abstract

The accurate classification of Air Quality Index (AQI) is critical for environmental monitoring and public health protection. In this paper, we utilized a publicly available daily air quality dataset from U.S. counties, comprising six classification categories: Good, Moderate, Unhealthy for Sensitive Groups, Unhealthy, Very Unhealthy, and Hazardous. The dataset underwent preprocessing through missing value imputation and class balancing using the Synthetic Minority Over-sampling Technique (SMOTE). Several machine learning and deep learning models were trained and evaluated on the dataset, including Random Forest (RF), Extra Trees (ET), K-Nearest Neighbors (KNN), Naive Bayes (NB), Logistic Regression (LR), and a Multi-Layer Perceptron (MLP) neural network. The models were assessed using cross-validation accuracy, test set accuracy, macro-averaged recall, F1-Score, and ROC-AUC metrics. Ensemble methods (RRF and ET) and the MLP classifier achieved superior results compared to traditional models. The RF model achieved a test accuracy of 99.3%, while the MLP classifier achieved 99.0%. The stacking ensemble model achieved a test accuracy of 99.99 %, a macro-averaged recall of 87.12 %, and an ROC-AUC of 1.0000, highlighting the strong potential of ensemble learning techniques in enhancing the performance of AQI multi-class classification.

Original languageEnglish
Pages (from-to)29325-29333
Number of pages9
JournalEngineering, Technology and Applied Science Research
Volume15
Issue number6
DOIs
StatePublished - 8 Dec 2025

Keywords

  • air pollution
  • air quality classification
  • Air Quality Index (AQI)
  • ensemble machine learning
  • environmental monitoring
  • machine learning

Fingerprint

Dive into the research topics of 'Enhancing Air Quality Index Classification Based on Ensemble Machine Learning Techniques'. Together they form a unique fingerprint.

Cite this