A hybrid framework for malware determination: generative pre-trained transformer-inspired approach

Research output: Contribution to journalArticlepeer-review

Abstract

Cyber attacks and malware incidents have surged in prevalence, posing significant risks across various system domains. As a result, the development of automated machine learning techniques for robust malware defense has become increasingly vital. Among the prominent methodologies, two deep learning architectures for malware detection stand out: Generative Pre-trained Transformer 3 (GPT-3) and Stacked Bidirectional Long Short-Term Memory (SBiLSTM). These language models are engineered by parsing both malicious and benign Portable Executable (PE) files, specifically focusing on the.text segment which contains assembly instructions. Each instruction is treated as a discrete phrase, with the.text segments considered as individual documents. The categorization process classifies each phrase as either safe or vulnerable based on the underlying data source. This approach led to the creation of three distinct datasets. The first dataset comprises complete documents, which are analyzed using an SBiLSTM-based Document Level Analysis framework. The second dataset is constructed from individual sentences, which are processed through SBiLSTM Sentence Level Analysis mechanisms. Additionally, both the Domain-Specific Language (DSL) model and the General Language Model (GLM) based on GPT-3 are employed for enhanced contextual understanding. Ultimately, a pre-trained model is proposed, leveraging a dataset enriched with unlabeled assembly instructions. The efficacy of this malware detection framework is benchmarked against leading-edge research in the field. Notably, the detection performance of the GPT-3 integrated mechanism has shown substantial improvement, underscoring its potential as a formidable tool in the ongoing battle against malware threats.

Original languageEnglish
Article number371
JournalCluster Computing
Volume28
Issue number6
DOIs
StatePublished - Oct 2025

Keywords

  • GPT3
  • Malware identification
  • Stacked LSTM
  • Static assessment

Fingerprint

Dive into the research topics of 'A hybrid framework for malware determination: generative pre-trained transformer-inspired approach'. Together they form a unique fingerprint.

Cite this