Aliasing black box adversarial attack with joint self-attention distribution and confidence probability

Jun Liu, Haoyu Jin, Guangxia Xu, Mingwei Lin, Tao Wu, Majid Nour, Fayadh Alenezi, Adi Alhudhaif, Kemal Polat

Research output: Contribution to journalArticlepeer-review

34 Scopus citations

Abstract

Deep neural networks (DNNs) are vulnerable to adversarial attacks, in which a small perturbation to samples can cause misclassification. However, how to select important words for textual attack models is a big challenge. Therefore, in this paper, an innovative score-based attack model is proposed to solve the important words selection problem for textual attack models. To this end, the generation of semantically adversarial examples in this model is adopted to mislead a text classification model. Then, this model integrates the self-attention mechanism and confidence probabilities for the selection of the important words. Moreover, an alternative model similar to the transfer attack is introduced to reflect the correlation degree of words inside the texts. Finally, adversarial training experimental results demonstrate the superiority of the proposed model.

Original languageEnglish
Article number119110
JournalExpert Systems with Applications
Volume214
DOIs
StatePublished - 15 Mar 2023

Keywords

  • Adversarial attack
  • Self-attention distribution
  • Text classification

Fingerprint

Dive into the research topics of 'Aliasing black box adversarial attack with joint self-attention distribution and confidence probability'. Together they form a unique fingerprint.

Cite this