Enhancing lysine 2-hydroxyisobutyrylation site prediction using LightGBM and hybrid sequence features

  • Heba M. Elreify
  • , Fathi E. Abd El-Samie
  • , Moawad I. Dessouky
  • , Hanaa Torkey
  • , Said E. El-Khamy
  • , Wafaa A. Shalaby

Research output: Contribution to journalArticlepeer-review

Abstract

Lysine 2-hydroxyisobutyrylation (Khib), a Post-Translational Modification (PTM), plays a pivotal role in regulating protein structure and function, with emerging evidence highlighting its significance in cellular metabolism, transcriptional regulation, and disease pathways. However, the experimental identification of Khib sites is hindered by labour-intensive methods and the dynamic nature of these modifications. To address this challenge, we propose a computational framework utilizing a Light Gradient Boosting Machine (LightGBM) for predicting Khib sites. 37-amino acid peptide sequences are represented using a hybrid feature set that combines Evolutionary Scale Modeling (ESM), Composition, Transition, Distribution (CTD), and AAindex descriptors. These features capture both the evolutionary and physicochemical properties of the protein sequences. Mutual information-based feature selection enhances model performance, while LightGBM outperforms alternative classifiers, including Support Vector Machines (SVM) and XGBoost. Validation on Homo sapiens, Toxoplasma gondii, and Oryza sativa datasets yielded Area under ROC Curve (AUC) values of 0.846, 0.836, and 0.788, respectively, surpassing existing predictors such as iLys-Khib and KhibPred. Additionally, sequence analysis revealed species-specific amino acid preferences surrounding Khib sites, providing insights into the biological determinants of this modification and advancing the prediction of Khib sites across species.

Original languageEnglish
Article number139
JournalNetwork Modeling Analysis in Health Informatics and Bioinformatics
Volume14
Issue number1
DOIs
StatePublished - Dec 2025

Keywords

  • ESM
  • LightGBM
  • Lysine 2-hydroxyisobutyrylation
  • Mutual information
  • Post-translational modification

Fingerprint

Dive into the research topics of 'Enhancing lysine 2-hydroxyisobutyrylation site prediction using LightGBM and hybrid sequence features'. Together they form a unique fingerprint.

Cite this