Predicting Chemical Biodegradability for Sustainable Chemical Manufacturing: A Machine Learning Approach Using 3D Molecular Descriptors

  • Alaa M. Elsayad
  • , Hassan Yousif Ahmed
  • , Khaled A. Elsayad
  • , Ammar Elyas Babiker Hassan
  • , Mustafa Mohammed Hassan Mustafa
  • , Akhtar Nawaz Khan
  • , Arif Abdelwhab Ali
  • , Sahar A. Mokhtar

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Achieving sustainable cities and promoting responsible consumption require innovative approaches to chemical design and manufacturing. Precise prediction of chemical biodegradability is crucial for evaluating environmental concerns and facilitating the transition towards green chemistry. This study investigates the effectiveness of ten distinct groups of three-dimensional (3D) molecular descriptors for classifying compounds with rapid biodegradability. The Merck molecular force field (MMFF94s) was used to compute descriptors and generate 3D conformations for a dataset of chemical compounds. The dataset underwent rigorous preprocessing, including feature selection, outlier management, and scaling. Support Vector Machines (SVMs) were tested alongside three tree-based ensemble learning algorithms: Extreme Gradient Boosting (XGBoost), Gradient Boosting Machine (GBM), and Random Forest. Bayesian optimization was employed to optimize model hyperparameters and enhance cross-validated Area Under the Receiver Operating Characteristic Curve (AUC). The GETAWAY descriptors, 3D autocorrelation descriptors, and 3D-MoRSE descriptors consistently demonstrated superior performance compared to other descriptors across all machine learning models. An SVM model trained on 3D autocorrelation descriptors achieved the highest prediction accuracy (0.88), sensitivity (0.83), specificity (0.91), F1-score (0.82), Cohen’s Kappa statistic (0.74), and an AUC of 0.93 on an independent test set. Advanced analytical techniques, including Permutation Feature Importance (PFI), SHapley Additive exPlanations (SHAP), and partial dependency plots (PDP) were utilized to identify the most influential 3D autocorrelation descriptors. The findings of this study demonstrate that 3D molecular descriptors, particularly 3D autocorrelations, play a critical role in developing accurate and interpretable models for predicting chemical biodegradability. These models contribute significantly to the advancement of green chemical design and the development of effective regulatory policies that support the objectives of SDG 11 (Sustainable Cities and Communities) and SDG 12 (Responsible Consumption and Production). By fostering sustainable chemical manufacturing practices, we can create healthier and more resilient urban environments while minimizing the environmental impact of human activities.

Original languageEnglish
Pages (from-to)76-86
Number of pages11
JournalApplied Environmental Biotechnology
Volume9
Issue number2
DOIs
StatePublished - 2024

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 9 - Industry, Innovation, and Infrastructure
    SDG 9 Industry, Innovation, and Infrastructure
  2. SDG 11 - Sustainable Cities and Communities
    SDG 11 Sustainable Cities and Communities
  3. SDG 15 - Life on Land
    SDG 15 Life on Land
  4. SDG 17 - Partnerships for the Goals
    SDG 17 Partnerships for the Goals

Keywords

  • 3D molecular descriptors
  • Biodegradability
  • QSAR
  • SHAP
  • SVM
  • XGboost
  • environmental risk assessment
  • gradient boosting
  • random forest permutation feature importance
  • sustainable chemistry

Fingerprint

Dive into the research topics of 'Predicting Chemical Biodegradability for Sustainable Chemical Manufacturing: A Machine Learning Approach Using 3D Molecular Descriptors'. Together they form a unique fingerprint.

Cite this