An efficient machine-learning framework for predicting protein post-translational modification sites

  • Heba M. Elreify
  • , Fathi E.Abd El-Samie
  • , Moawad I. Dessouky
  • , Hanaa Torkey
  • , Said E. El-Khamy
  • , Wafaa A. Shalaby

Research output: Contribution to journalArticlepeer-review

Abstract

Post-Translational Modifications (PTMs), particularly lysine 2-hydroxyisobutyrylation (Khib), represent critical regulatory mechanisms governing protein structure and function, with mounting evidence underscoring their important implications in cellular metabolism, transcriptional regulation, and pathological processes. Despite this significance, the experimental identification of Khib sites remains constrained by resource-intensive methodologies and the transient nature of these modifications. To overcome these limitations, we introduce HyLightKhib, a computational framework that leverages Light Gradient Boosting Machine architecture for accurate Khib site prediction. Our approach depends on a hybrid feature extraction strategy, integrating Evolutionary Scale Modeling (ESM-2) embeddings with comprehensive Composition, Transition, and Distribution (CTD) descriptors as well as curated amino acid physicochemical properties for fixed-length peptides of 43 amino acids. The proposed classifier demonstrated considerable performance over contemporary algorithms, including XGBoost and CatBoostimplementations through mutual information-based feature selection optimization. Cross-species validation on diverse organisms including, human, parasite, and rice achieved improved Area Under the Receiver Operating Characteristic Curve (AUC-ROC) scores of 0.893, 0.876, and 0.847, respectively, outperforming existing predictors, such as DeepKhib, and ResNetKhib. HyLightKhib represents an advancement in computational PTM prediction, providing enhanced predictive performance and valuable biological insights with direct implications for functional proteomics and PTM-targeted therapies.

Original languageEnglish
Article number31179
JournalScientific Reports
Volume15
Issue number1
DOIs
StatePublished - Dec 2025
Externally publishedYes

Keywords

  • ESM
  • LightGBM
  • Lysine 2-hydroxyisobutyrylation
  • Machine learning
  • Mutual information
  • Post-translational modification
  • Protein language models

Fingerprint

Dive into the research topics of 'An efficient machine-learning framework for predicting protein post-translational modification sites'. Together they form a unique fingerprint.

Cite this