RDET stacking classifier: a novel machine learning based approach for stroke prediction using imbalance data

Amjad Rehman; Teg Alam; Muhammad Mujahid; Faten S. Alamri; Bayan Al Ghofaily; Tanzila Saba

doi:10.7717/peerj-cs.1684

RDET stacking classifier: a novel machine learning based approach for stroke prediction using imbalance data

Amjad Rehman
, Teg Alam
, Muhammad Mujahid
, Faten S. Alamri
, Bayan Al Ghofaily
, Tanzila Saba

Industrial Engineering

Research output: Contribution to journal › Article › peer-review

15 Scopus citations

Abstract

The main cause of stroke is the unexpected blockage of blood flow to the brain. The brain cells die if blood is not supplied to them, resulting in body disability. The timely identification of medical conditions ensures patients receive the necessary treatments and assistance. This early diagnosis plays a crucial role in managing symptoms effectively and enhancing the overall quality of life for individuals affected by the stroke. The research proposed an ensemble machine learning (ML) model that predicts brain stroke while reducing parameters and computational complexity. The dataset was obtained from an open-source website Kaggle and the total number of participants is 3,254. However, this dataset needs a significant class imbalance problem. To address this issue, we utilized Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADAYSN), a technique for oversampling issues. The primary focus of this study centers around developing a stacking and voting approach that exhibits exceptional performance. We propose a stacking ensemble classifier that is more accurate and effective in predicting stroke disease in order to improve the classifier’s performance and minimize overfitting problems. To create a final stronger classifier, the study used three tree-based ML classifiers. Hyperparameters are used to train and fine-tune the random forest (RF), decision tree (DT), and extra tree classifier (ETC), after which they were combined using a stacking classifier and a k-fold cross-validation technique. The effectiveness of this method is verified through the utilization of metrics such as accuracy, precision, recall, and F1-score. In addition, we utilized nine ML classifiers with Hyper-parameter tuning to predict the stroke and compare the effectiveness of Proposed approach with these classifiers. The experimental outcomes demonstrated the superior performance of the stacking classification method compared to other approaches. The stacking method achieved a remarkable accuracy of 100% as well as exceptional F1-score, precision, and recall score. The proposed approach demonstrates a higher rate of accurate predictions compared to previous techniques.

Original language	English
Article number	e1684
Journal	PeerJ Computer Science
Volume	9
DOIs	https://doi.org/10.7717/peerj-cs.1684
State	Published - 2023

Keywords

Machine learning
SMOTE
Stacking ensemble
Stroke prediction
Voting classifier

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.7717/peerj-cs.1684

Cite this

@article{5425ce3d78154c45a443e4a6d132a881,

title = "RDET stacking classifier: a novel machine learning based approach for stroke prediction using imbalance data",

abstract = "The main cause of stroke is the unexpected blockage of blood flow to the brain. The brain cells die if blood is not supplied to them, resulting in body disability. The timely identification of medical conditions ensures patients receive the necessary treatments and assistance. This early diagnosis plays a crucial role in managing symptoms effectively and enhancing the overall quality of life for individuals affected by the stroke. The research proposed an ensemble machine learning (ML) model that predicts brain stroke while reducing parameters and computational complexity. The dataset was obtained from an open-source website Kaggle and the total number of participants is 3,254. However, this dataset needs a significant class imbalance problem. To address this issue, we utilized Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADAYSN), a technique for oversampling issues. The primary focus of this study centers around developing a stacking and voting approach that exhibits exceptional performance. We propose a stacking ensemble classifier that is more accurate and effective in predicting stroke disease in order to improve the classifier{\textquoteright}s performance and minimize overfitting problems. To create a final stronger classifier, the study used three tree-based ML classifiers. Hyperparameters are used to train and fine-tune the random forest (RF), decision tree (DT), and extra tree classifier (ETC), after which they were combined using a stacking classifier and a k-fold cross-validation technique. The effectiveness of this method is verified through the utilization of metrics such as accuracy, precision, recall, and F1-score. In addition, we utilized nine ML classifiers with Hyper-parameter tuning to predict the stroke and compare the effectiveness of Proposed approach with these classifiers. The experimental outcomes demonstrated the superior performance of the stacking classification method compared to other approaches. The stacking method achieved a remarkable accuracy of 100\% as well as exceptional F1-score, precision, and recall score. The proposed approach demonstrates a higher rate of accurate predictions compared to previous techniques.",

keywords = "Machine learning, SMOTE, Stacking ensemble, Stroke prediction, Voting classifier",

author = "Amjad Rehman and Teg Alam and Muhammad Mujahid and Alamri, \{Faten S.\} and \{Al Ghofaily\}, Bayan and Tanzila Saba",

note = "Publisher Copyright: {\textcopyright} 2023 Rehman et al.",

year = "2023",

doi = "10.7717/peerj-cs.1684",

language = "English",

volume = "9",

journal = "PeerJ Computer Science",

issn = "2376-5992",

publisher = "PeerJ Inc.",

}

TY - JOUR

T1 - RDET stacking classifier

T2 - a novel machine learning based approach for stroke prediction using imbalance data

AU - Rehman, Amjad

AU - Alam, Teg

AU - Mujahid, Muhammad

AU - Alamri, Faten S.

AU - Al Ghofaily, Bayan

AU - Saba, Tanzila

PY - 2023

Y1 - 2023

N2 - The main cause of stroke is the unexpected blockage of blood flow to the brain. The brain cells die if blood is not supplied to them, resulting in body disability. The timely identification of medical conditions ensures patients receive the necessary treatments and assistance. This early diagnosis plays a crucial role in managing symptoms effectively and enhancing the overall quality of life for individuals affected by the stroke. The research proposed an ensemble machine learning (ML) model that predicts brain stroke while reducing parameters and computational complexity. The dataset was obtained from an open-source website Kaggle and the total number of participants is 3,254. However, this dataset needs a significant class imbalance problem. To address this issue, we utilized Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADAYSN), a technique for oversampling issues. The primary focus of this study centers around developing a stacking and voting approach that exhibits exceptional performance. We propose a stacking ensemble classifier that is more accurate and effective in predicting stroke disease in order to improve the classifier’s performance and minimize overfitting problems. To create a final stronger classifier, the study used three tree-based ML classifiers. Hyperparameters are used to train and fine-tune the random forest (RF), decision tree (DT), and extra tree classifier (ETC), after which they were combined using a stacking classifier and a k-fold cross-validation technique. The effectiveness of this method is verified through the utilization of metrics such as accuracy, precision, recall, and F1-score. In addition, we utilized nine ML classifiers with Hyper-parameter tuning to predict the stroke and compare the effectiveness of Proposed approach with these classifiers. The experimental outcomes demonstrated the superior performance of the stacking classification method compared to other approaches. The stacking method achieved a remarkable accuracy of 100% as well as exceptional F1-score, precision, and recall score. The proposed approach demonstrates a higher rate of accurate predictions compared to previous techniques.

AB - The main cause of stroke is the unexpected blockage of blood flow to the brain. The brain cells die if blood is not supplied to them, resulting in body disability. The timely identification of medical conditions ensures patients receive the necessary treatments and assistance. This early diagnosis plays a crucial role in managing symptoms effectively and enhancing the overall quality of life for individuals affected by the stroke. The research proposed an ensemble machine learning (ML) model that predicts brain stroke while reducing parameters and computational complexity. The dataset was obtained from an open-source website Kaggle and the total number of participants is 3,254. However, this dataset needs a significant class imbalance problem. To address this issue, we utilized Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADAYSN), a technique for oversampling issues. The primary focus of this study centers around developing a stacking and voting approach that exhibits exceptional performance. We propose a stacking ensemble classifier that is more accurate and effective in predicting stroke disease in order to improve the classifier’s performance and minimize overfitting problems. To create a final stronger classifier, the study used three tree-based ML classifiers. Hyperparameters are used to train and fine-tune the random forest (RF), decision tree (DT), and extra tree classifier (ETC), after which they were combined using a stacking classifier and a k-fold cross-validation technique. The effectiveness of this method is verified through the utilization of metrics such as accuracy, precision, recall, and F1-score. In addition, we utilized nine ML classifiers with Hyper-parameter tuning to predict the stroke and compare the effectiveness of Proposed approach with these classifiers. The experimental outcomes demonstrated the superior performance of the stacking classification method compared to other approaches. The stacking method achieved a remarkable accuracy of 100% as well as exceptional F1-score, precision, and recall score. The proposed approach demonstrates a higher rate of accurate predictions compared to previous techniques.

KW - Machine learning

KW - SMOTE

KW - Stacking ensemble

KW - Stroke prediction

KW - Voting classifier

UR - https://www.scopus.com/pages/publications/85179011627

U2 - 10.7717/peerj-cs.1684

DO - 10.7717/peerj-cs.1684

M3 - Article

AN - SCOPUS:85179011627

SN - 2376-5992

VL - 9

JO - PeerJ Computer Science

JF - PeerJ Computer Science

M1 - e1684

ER -

RDET stacking classifier: a novel machine learning based approach for stroke prediction using imbalance data

Abstract

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this