TY - JOUR
T1 - AI-Driven Predictive Modeling for Lung Cancer Detection and Management Using Synthetic Data Augmentation and Random Forest Classifier
AU - Innab, Nisreen
AU - Aldrees, Asma
AU - AlHammadi, Dina Abdulaziz
AU - Hakeem, Abeer
AU - Umer, Muhammad
AU - Alsubai, Shtwai
AU - Trelova, Silvia
AU - Ashraf, Imran
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/12
Y1 - 2025/12
N2 - Artificial intelligence (AI) transforms multiple businesses, including medical research, where AI-driven developments bring significant advantages. The application of machine learning algorithms enables medical researchers to examine large amounts of data accurately, which leads to the development of precise and effective treatment approaches. Lung cancer leads the list of critical healthcare issues because it remains the world’s most lethal form of cancer thus demanding innovative diagnostic tools for faster and accurate identification. The proposed study introduces an innovative method called CTGAN-RF which uses conditional tabular generative adversarial networks (CTGAN) and random forest (RF) classifier to detect lung cancer through synthetic data generation. The proposed model demonstrated superior performance by achieving a 0.9893 score of accuracy and 0.99 value for precision, F1 score, and recall. Extensive experimental evaluation for this method included testing nine classification algorithms. The implementation of different classifiers employed data balancing methods including SMOTE and borderline-SMOTE along with SMOTE ENN and unbalanced data configurations. Comparative analysis showed that CTGAN-RF consistently performed significantly better than traditional classifiers in dealing with class imbalance and improving prediction accuracy. After testing with fivefold cross-validation, the reliability of the model was further validated. In comparison to cutting-edge approaches for lung cancer diagnosis, the proposed methodology outperformed in terms of classification metrics. This in-depth evaluation of synthetic data augmentation with machine learning in lung cancer detection has helped in the development of personalized treatment strategies in the fight against such a life-threatening disease.
AB - Artificial intelligence (AI) transforms multiple businesses, including medical research, where AI-driven developments bring significant advantages. The application of machine learning algorithms enables medical researchers to examine large amounts of data accurately, which leads to the development of precise and effective treatment approaches. Lung cancer leads the list of critical healthcare issues because it remains the world’s most lethal form of cancer thus demanding innovative diagnostic tools for faster and accurate identification. The proposed study introduces an innovative method called CTGAN-RF which uses conditional tabular generative adversarial networks (CTGAN) and random forest (RF) classifier to detect lung cancer through synthetic data generation. The proposed model demonstrated superior performance by achieving a 0.9893 score of accuracy and 0.99 value for precision, F1 score, and recall. Extensive experimental evaluation for this method included testing nine classification algorithms. The implementation of different classifiers employed data balancing methods including SMOTE and borderline-SMOTE along with SMOTE ENN and unbalanced data configurations. Comparative analysis showed that CTGAN-RF consistently performed significantly better than traditional classifiers in dealing with class imbalance and improving prediction accuracy. After testing with fivefold cross-validation, the reliability of the model was further validated. In comparison to cutting-edge approaches for lung cancer diagnosis, the proposed methodology outperformed in terms of classification metrics. This in-depth evaluation of synthetic data augmentation with machine learning in lung cancer detection has helped in the development of personalized treatment strategies in the fight against such a life-threatening disease.
KW - Class imbalance handling
KW - CTGAN-RF model
KW - Data augmentation
KW - Healthcare
KW - Lung cancer detection
KW - Machine learning in healthcare
UR - http://www.scopus.com/inward/record.url?scp=105007634866&partnerID=8YFLogxK
U2 - 10.1007/s44196-025-00879-4
DO - 10.1007/s44196-025-00879-4
M3 - Article
AN - SCOPUS:105007634866
SN - 1875-6891
VL - 18
JO - International Journal of Computational Intelligence Systems
JF - International Journal of Computational Intelligence Systems
IS - 1
M1 - 145
ER -