A Multimodal Learning Framework to Reduce Misclassification in GI Tract Disease Diagnosis

Sadia Fatima; Fadl Dahan; Jamal Hussain Shah; Refan Almohamedh; Mohammed Aloqaily; Samia Riaz

doi:10.32604/cmes.2025.070272

A Multimodal Learning Framework to Reduce Misclassification in GI Tract Disease Diagnosis

Sadia Fatima
, Fadl Dahan
, Jamal Hussain Shah
, Refan Almohamedh
, Mohammed Aloqaily
, Samia Riaz

Information Systems

Research output: Contribution to journal › Article › peer-review

Abstract

The human gastrointestinal (GI) tract is influenced by numerous disorders. If not detected in the early stages, they may result in severe consequences such as organ failure or the development of cancer, and in extreme cases, become life-threatening. Endoscopy is a specialised imaging technique used to examine the GI tract. However, physicians might neglect certain irregular morphologies during the examination due to continuous monitoring of the video recording. Recent advancements in artificial intelligence have led to the development of high-performance AI-based systems, which are optimal for computer-assisted diagnosis. Due to numerous limitations in endoscopic image analysis, including visual similarities between infected and healthy areas, retrieval of irrelevant features, and imbalanced testing and training datasets, performance accuracy is reduced. To address these challenges, we proposed a framework for analysing gastrointestinal tract images that provides a more robust and secure model, thereby reducing the chances of misclassification. Compared to single model solutions, the proposed methodology improves performance by integrating diverse models and optimizing feature fusion using a dual-branch CNN transformer architecture. The proposed approach employs a dual-branch feature extraction mechanism, where in the first branch, features are extracted using Extended BEiT, and EfficientNet-B5 is utilized in the second branch. Additionally, cross-entropy loss is used to measure the error of prediction at both branches, followed by model stacking. This multimodal framework outperforms existing approaches across multiple metrics, achieving 94.12% accuracy, recall and F1-score, as well as 94.15% precision on the Kvasir dataset. Furthermore, the model successfully reduced the false negative rate to 5.88%, enhancing its ability to minimize misdiagnosis. These results highlight the adaptability of the proposed work in clinical practice, where it can provide fast and accurate diagnostic assistance crucial for improving the early diagnosis of diseases in the gastrointestinal tract.

Original language	English
Pages (from-to)	971-994
Number of pages	24
Journal	CMES - Computer Modeling in Engineering and Sciences
Volume	145
Issue number	1
DOIs	https://doi.org/10.32604/cmes.2025.070272
State	Published - 2025

Keywords

deep learning
disease diagnosis
gastrointestinal GI
misclassification
Multimodal
transformer

Access to Document

10.32604/cmes.2025.070272

Cite this

@article{d4afa8d97b824438a0a06148b0cc6d89,

title = "A Multimodal Learning Framework to Reduce Misclassification in GI Tract Disease Diagnosis",

abstract = "The human gastrointestinal (GI) tract is influenced by numerous disorders. If not detected in the early stages, they may result in severe consequences such as organ failure or the development of cancer, and in extreme cases, become life-threatening. Endoscopy is a specialised imaging technique used to examine the GI tract. However, physicians might neglect certain irregular morphologies during the examination due to continuous monitoring of the video recording. Recent advancements in artificial intelligence have led to the development of high-performance AI-based systems, which are optimal for computer-assisted diagnosis. Due to numerous limitations in endoscopic image analysis, including visual similarities between infected and healthy areas, retrieval of irrelevant features, and imbalanced testing and training datasets, performance accuracy is reduced. To address these challenges, we proposed a framework for analysing gastrointestinal tract images that provides a more robust and secure model, thereby reducing the chances of misclassification. Compared to single model solutions, the proposed methodology improves performance by integrating diverse models and optimizing feature fusion using a dual-branch CNN transformer architecture. The proposed approach employs a dual-branch feature extraction mechanism, where in the first branch, features are extracted using Extended BEiT, and EfficientNet-B5 is utilized in the second branch. Additionally, cross-entropy loss is used to measure the error of prediction at both branches, followed by model stacking. This multimodal framework outperforms existing approaches across multiple metrics, achieving 94.12\% accuracy, recall and F1-score, as well as 94.15\% precision on the Kvasir dataset. Furthermore, the model successfully reduced the false negative rate to 5.88\%, enhancing its ability to minimize misdiagnosis. These results highlight the adaptability of the proposed work in clinical practice, where it can provide fast and accurate diagnostic assistance crucial for improving the early diagnosis of diseases in the gastrointestinal tract.",

keywords = "deep learning, disease diagnosis, gastrointestinal GI, misclassification, Multimodal, transformer",

author = "Sadia Fatima and Fadl Dahan and Shah, \{Jamal Hussain\} and Refan Almohamedh and Mohammed Aloqaily and Samia Riaz",

note = "Publisher Copyright: Copyright {\textcopyright} 2025 The Authors.",

year = "2025",

doi = "10.32604/cmes.2025.070272",

language = "English",

volume = "145",

pages = "971--994",

journal = "CMES - Computer Modeling in Engineering and Sciences",

issn = "1526-1492",

publisher = "Tech Science Press",

number = "1",

}

TY - JOUR

T1 - A Multimodal Learning Framework to Reduce Misclassification in GI Tract Disease Diagnosis

AU - Fatima, Sadia

AU - Dahan, Fadl

AU - Shah, Jamal Hussain

AU - Almohamedh, Refan

AU - Aloqaily, Mohammed

AU - Riaz, Samia

PY - 2025

Y1 - 2025

N2 - The human gastrointestinal (GI) tract is influenced by numerous disorders. If not detected in the early stages, they may result in severe consequences such as organ failure or the development of cancer, and in extreme cases, become life-threatening. Endoscopy is a specialised imaging technique used to examine the GI tract. However, physicians might neglect certain irregular morphologies during the examination due to continuous monitoring of the video recording. Recent advancements in artificial intelligence have led to the development of high-performance AI-based systems, which are optimal for computer-assisted diagnosis. Due to numerous limitations in endoscopic image analysis, including visual similarities between infected and healthy areas, retrieval of irrelevant features, and imbalanced testing and training datasets, performance accuracy is reduced. To address these challenges, we proposed a framework for analysing gastrointestinal tract images that provides a more robust and secure model, thereby reducing the chances of misclassification. Compared to single model solutions, the proposed methodology improves performance by integrating diverse models and optimizing feature fusion using a dual-branch CNN transformer architecture. The proposed approach employs a dual-branch feature extraction mechanism, where in the first branch, features are extracted using Extended BEiT, and EfficientNet-B5 is utilized in the second branch. Additionally, cross-entropy loss is used to measure the error of prediction at both branches, followed by model stacking. This multimodal framework outperforms existing approaches across multiple metrics, achieving 94.12% accuracy, recall and F1-score, as well as 94.15% precision on the Kvasir dataset. Furthermore, the model successfully reduced the false negative rate to 5.88%, enhancing its ability to minimize misdiagnosis. These results highlight the adaptability of the proposed work in clinical practice, where it can provide fast and accurate diagnostic assistance crucial for improving the early diagnosis of diseases in the gastrointestinal tract.

AB - The human gastrointestinal (GI) tract is influenced by numerous disorders. If not detected in the early stages, they may result in severe consequences such as organ failure or the development of cancer, and in extreme cases, become life-threatening. Endoscopy is a specialised imaging technique used to examine the GI tract. However, physicians might neglect certain irregular morphologies during the examination due to continuous monitoring of the video recording. Recent advancements in artificial intelligence have led to the development of high-performance AI-based systems, which are optimal for computer-assisted diagnosis. Due to numerous limitations in endoscopic image analysis, including visual similarities between infected and healthy areas, retrieval of irrelevant features, and imbalanced testing and training datasets, performance accuracy is reduced. To address these challenges, we proposed a framework for analysing gastrointestinal tract images that provides a more robust and secure model, thereby reducing the chances of misclassification. Compared to single model solutions, the proposed methodology improves performance by integrating diverse models and optimizing feature fusion using a dual-branch CNN transformer architecture. The proposed approach employs a dual-branch feature extraction mechanism, where in the first branch, features are extracted using Extended BEiT, and EfficientNet-B5 is utilized in the second branch. Additionally, cross-entropy loss is used to measure the error of prediction at both branches, followed by model stacking. This multimodal framework outperforms existing approaches across multiple metrics, achieving 94.12% accuracy, recall and F1-score, as well as 94.15% precision on the Kvasir dataset. Furthermore, the model successfully reduced the false negative rate to 5.88%, enhancing its ability to minimize misdiagnosis. These results highlight the adaptability of the proposed work in clinical practice, where it can provide fast and accurate diagnostic assistance crucial for improving the early diagnosis of diseases in the gastrointestinal tract.

KW - deep learning

KW - disease diagnosis

KW - gastrointestinal GI

KW - misclassification

KW - Multimodal

KW - transformer

UR - https://www.scopus.com/pages/publications/105021084467

U2 - 10.32604/cmes.2025.070272

DO - 10.32604/cmes.2025.070272

M3 - Article

AN - SCOPUS:105021084467

SN - 1526-1492

VL - 145

SP - 971

EP - 994

JO - CMES - Computer Modeling in Engineering and Sciences

JF - CMES - Computer Modeling in Engineering and Sciences

IS - 1

ER -

A Multimodal Learning Framework to Reduce Misclassification in GI Tract Disease Diagnosis

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this