Towards an Accurate Liver Disease Prediction Based on Two-level Ensemble Stacking Model

Marghny H. Mohamed; Botheina H. Ali; Ahmed I. Taloba; Ahmad O. Aseeri; Mohamed Abd Elaziz; Shaker El-Sappgah; Nora El-Rashidy

doi:10.1109/ACCESS.2024.3459429

Towards an Accurate Liver Disease Prediction Based on Two-level Ensemble Stacking Model

Marghny H. Mohamed
, Botheina H. Ali
, Ahmed I. Taloba
, Ahmad O. Aseeri
, Mohamed Abd Elaziz
, Shaker El-Sappgah
, Nora El-Rashidy

Computer Sciences

Research output: Contribution to journal › Article › peer-review

9 Scopus citations

Abstract

The difficulty of detecting liver disease at an early stage goes back to its limited number of symptoms. In this study, single and ensemble machine learning (ML) algorithms are applied to the Indian Liver Patient Dataset (ILPD) dataset, and their results, without and with feature selection techniques, are compared between each other and to the existing studies. Also, a two-level ensemble stacking model is applied based on several meta-ensemble classifiers and the feature selection technique to optimize the accuracy of the ensemble classifiers. Several data preprocessing techniques are employed to optimize the accuracy of the proposed work, including data encoding, data cleaning, data scaling, data skewing transformation, data balancing, and feature selection. The choices of single model ML are logistic regression (LR), K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), and multilayer perceptron (MLP). In contrast, the choices of ensemble ML models are extra tree classifier, random forest (RF), gradient boosting, AdaBoost, extreme gradient boosting (XGBoost), and ensemble stacking classifier. Among the ensemble models, the ensemble stacking model achieved the highest accuracies (93.88% and 94.12%) when trained without and with the feature selection technique using the 10-fold cross-validation. The two-level ensemble stacking model achieved the highest performance with the metrics values: accuracy (94.01%), Precision (94.44%), Recall (94.25%), F1-score (94.01%), and area under the ROC curve (94.25%) when trained with feature selection technique. These results indicate that our proposed technique achieved a high prediction model for liver disease.

Original language	English
Journal	IEEE Access
DOIs	https://doi.org/10.1109/ACCESS.2024.3459429
State	Accepted/In press - 2024

Keywords

Ensemble stacking
ILPD dataset
feature selection
liver disease prediction
machine learning

Access to Document

10.1109/ACCESS.2024.3459429

Cite this

@article{5630589e31804cc1b9392a409b6511b5,

title = "Towards an Accurate Liver Disease Prediction Based on Two-level Ensemble Stacking Model",

abstract = "The difficulty of detecting liver disease at an early stage goes back to its limited number of symptoms. In this study, single and ensemble machine learning (ML) algorithms are applied to the Indian Liver Patient Dataset (ILPD) dataset, and their results, without and with feature selection techniques, are compared between each other and to the existing studies. Also, a two-level ensemble stacking model is applied based on several meta-ensemble classifiers and the feature selection technique to optimize the accuracy of the ensemble classifiers. Several data preprocessing techniques are employed to optimize the accuracy of the proposed work, including data encoding, data cleaning, data scaling, data skewing transformation, data balancing, and feature selection. The choices of single model ML are logistic regression (LR), K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), and multilayer perceptron (MLP). In contrast, the choices of ensemble ML models are extra tree classifier, random forest (RF), gradient boosting, AdaBoost, extreme gradient boosting (XGBoost), and ensemble stacking classifier. Among the ensemble models, the ensemble stacking model achieved the highest accuracies (93.88\% and 94.12\%) when trained without and with the feature selection technique using the 10-fold cross-validation. The two-level ensemble stacking model achieved the highest performance with the metrics values: accuracy (94.01\%), Precision (94.44\%), Recall (94.25\%), F1-score (94.01\%), and area under the ROC curve (94.25\%) when trained with feature selection technique. These results indicate that our proposed technique achieved a high prediction model for liver disease.",

keywords = "Ensemble stacking, ILPD dataset, feature selection, liver disease prediction, machine learning",

author = "Mohamed, \{Marghny H.\} and Ali, \{Botheina H.\} and Taloba, \{Ahmed I.\} and Aseeri, \{Ahmad O.\} and Elaziz, \{Mohamed Abd\} and Shaker El-Sappgah and Nora El-Rashidy",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2024",

doi = "10.1109/ACCESS.2024.3459429",

language = "English",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Towards an Accurate Liver Disease Prediction Based on Two-level Ensemble Stacking Model

AU - Mohamed, Marghny H.

AU - Ali, Botheina H.

AU - Taloba, Ahmed I.

AU - Aseeri, Ahmad O.

AU - Elaziz, Mohamed Abd

AU - El-Sappgah, Shaker

AU - El-Rashidy, Nora

PY - 2024

Y1 - 2024

N2 - The difficulty of detecting liver disease at an early stage goes back to its limited number of symptoms. In this study, single and ensemble machine learning (ML) algorithms are applied to the Indian Liver Patient Dataset (ILPD) dataset, and their results, without and with feature selection techniques, are compared between each other and to the existing studies. Also, a two-level ensemble stacking model is applied based on several meta-ensemble classifiers and the feature selection technique to optimize the accuracy of the ensemble classifiers. Several data preprocessing techniques are employed to optimize the accuracy of the proposed work, including data encoding, data cleaning, data scaling, data skewing transformation, data balancing, and feature selection. The choices of single model ML are logistic regression (LR), K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), and multilayer perceptron (MLP). In contrast, the choices of ensemble ML models are extra tree classifier, random forest (RF), gradient boosting, AdaBoost, extreme gradient boosting (XGBoost), and ensemble stacking classifier. Among the ensemble models, the ensemble stacking model achieved the highest accuracies (93.88% and 94.12%) when trained without and with the feature selection technique using the 10-fold cross-validation. The two-level ensemble stacking model achieved the highest performance with the metrics values: accuracy (94.01%), Precision (94.44%), Recall (94.25%), F1-score (94.01%), and area under the ROC curve (94.25%) when trained with feature selection technique. These results indicate that our proposed technique achieved a high prediction model for liver disease.

AB - The difficulty of detecting liver disease at an early stage goes back to its limited number of symptoms. In this study, single and ensemble machine learning (ML) algorithms are applied to the Indian Liver Patient Dataset (ILPD) dataset, and their results, without and with feature selection techniques, are compared between each other and to the existing studies. Also, a two-level ensemble stacking model is applied based on several meta-ensemble classifiers and the feature selection technique to optimize the accuracy of the ensemble classifiers. Several data preprocessing techniques are employed to optimize the accuracy of the proposed work, including data encoding, data cleaning, data scaling, data skewing transformation, data balancing, and feature selection. The choices of single model ML are logistic regression (LR), K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), and multilayer perceptron (MLP). In contrast, the choices of ensemble ML models are extra tree classifier, random forest (RF), gradient boosting, AdaBoost, extreme gradient boosting (XGBoost), and ensemble stacking classifier. Among the ensemble models, the ensemble stacking model achieved the highest accuracies (93.88% and 94.12%) when trained without and with the feature selection technique using the 10-fold cross-validation. The two-level ensemble stacking model achieved the highest performance with the metrics values: accuracy (94.01%), Precision (94.44%), Recall (94.25%), F1-score (94.01%), and area under the ROC curve (94.25%) when trained with feature selection technique. These results indicate that our proposed technique achieved a high prediction model for liver disease.

KW - Ensemble stacking

KW - ILPD dataset

KW - feature selection

KW - liver disease prediction

KW - machine learning

UR - https://www.scopus.com/pages/publications/85204518958

U2 - 10.1109/ACCESS.2024.3459429

DO - 10.1109/ACCESS.2024.3459429

M3 - Article

AN - SCOPUS:85204518958

SN - 2169-3536

JO - IEEE Access

JF - IEEE Access

ER -

Towards an Accurate Liver Disease Prediction Based on Two-level Ensemble Stacking Model

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this