Exploring CBC Data for Anemia Diagnosis: A Machine Learning and Ontology Perspective

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Background: Anemia, a common health disorder affecting populations globally, demands timely and accurate diagnosis for treatment to be effective. The aim of this paper is to detect and classify four types of anemia: hgb, iron-deficiency, folate-deficiency, and B12-deficiency anemia. Methods: This paper proposes an ontology-enhanced machine learning (ML) framework to classify types of anemia from CBC data obtained from Kaggle, which contains 15,300 patient records. It evaluates the effects of classical versus deep classifiers on imbalanced and oversampled training samples. Tests include KNN, SVM, DT, RF, CNN, CNN+SVM, CNN+RF, and XGBoost. Another interesting contribution is the use of ontological reasoning via SPARQL queries to semantically enrich clinical features with categories like “Low Hemoglobin” or “Macrocytic MCV”. These semantic features were then used in both classical (SVM) and deep hybrid models (CNN+SVM). Results: Ontology-enhanced and CNN hybrid models perform competitively when paired with ROS or ADASYN, but their performance degrades significantly on the original dataset. There were tremendous performance gains with ontology-enhanced models in that Onto-CNN+SVM achieved an F1-score (1.00) for all the four types of anemia under ROS sampling, while Onto-SVM exhibited more than 20% improvement in F1-scores for minority categories like folate and B12 when compared to baseline models, except XGBoost. Conclusions: Ontology-driven knowledge coalescence has been shown to improve classification results; however, XGBoost consistently outperformed all other classifiers across all data conditions, making it the most robust and reliable model for clinically relevant decision-support systems in anemia diagnosis.

Original languageEnglish
Article number35
JournalBioMedInformatics
Volume5
Issue number3
DOIs
StatePublished - Sep 2025

Keywords

  • anemia classification
  • balancing data
  • complete blood count
  • diagnostics
  • iron deficiency
  • machine learning
  • ontology
  • SPARQL query
  • XGBoost

Fingerprint

Dive into the research topics of 'Exploring CBC Data for Anemia Diagnosis: A Machine Learning and Ontology Perspective'. Together they form a unique fingerprint.

Cite this