AI-Driven Predictive Modeling for Lung Cancer Detection and Management Using Synthetic Data Augmentation and Random Forest Classifier

Nisreen Innab, Asma Aldrees, Dina Abdulaziz AlHammadi, Abeer Hakeem, Muhammad Umer, Shtwai Alsubai, Silvia Trelova, Imran Ashraf

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Artificial intelligence (AI) transforms multiple businesses, including medical research, where AI-driven developments bring significant advantages. The application of machine learning algorithms enables medical researchers to examine large amounts of data accurately, which leads to the development of precise and effective treatment approaches. Lung cancer leads the list of critical healthcare issues because it remains the world’s most lethal form of cancer thus demanding innovative diagnostic tools for faster and accurate identification. The proposed study introduces an innovative method called CTGAN-RF which uses conditional tabular generative adversarial networks (CTGAN) and random forest (RF) classifier to detect lung cancer through synthetic data generation. The proposed model demonstrated superior performance by achieving a 0.9893 score of accuracy and 0.99 value for precision, F1 score, and recall. Extensive experimental evaluation for this method included testing nine classification algorithms. The implementation of different classifiers employed data balancing methods including SMOTE and borderline-SMOTE along with SMOTE ENN and unbalanced data configurations. Comparative analysis showed that CTGAN-RF consistently performed significantly better than traditional classifiers in dealing with class imbalance and improving prediction accuracy. After testing with fivefold cross-validation, the reliability of the model was further validated. In comparison to cutting-edge approaches for lung cancer diagnosis, the proposed methodology outperformed in terms of classification metrics. This in-depth evaluation of synthetic data augmentation with machine learning in lung cancer detection has helped in the development of personalized treatment strategies in the fight against such a life-threatening disease.

Original languageEnglish
Article number145
JournalInternational Journal of Computational Intelligence Systems
Volume18
Issue number1
DOIs
StatePublished - Dec 2025

Keywords

  • Class imbalance handling
  • CTGAN-RF model
  • Data augmentation
  • Healthcare
  • Lung cancer detection
  • Machine learning in healthcare

Fingerprint

Dive into the research topics of 'AI-Driven Predictive Modeling for Lung Cancer Detection and Management Using Synthetic Data Augmentation and Random Forest Classifier'. Together they form a unique fingerprint.

Cite this