Aliasing black box adversarial attack with joint self-attention distribution and confidence probability

Jun Liu; Haoyu Jin; Guangxia Xu; Mingwei Lin; Tao Wu; Majid Nour; Fayadh Alenezi; Adi Alhudhaif; Kemal Polat

doi:10.1016/j.eswa.2022.119110

Aliasing black box adversarial attack with joint self-attention distribution and confidence probability

Jun Liu, Haoyu Jin, Guangxia Xu, Mingwei Lin, Tao Wu, Majid Nour, Fayadh Alenezi, Adi Alhudhaif, Kemal Polat

Computer Sciences

Research output: Contribution to journal › Article › peer-review

34 Scopus citations

Abstract

Deep neural networks (DNNs) are vulnerable to adversarial attacks, in which a small perturbation to samples can cause misclassification. However, how to select important words for textual attack models is a big challenge. Therefore, in this paper, an innovative score-based attack model is proposed to solve the important words selection problem for textual attack models. To this end, the generation of semantically adversarial examples in this model is adopted to mislead a text classification model. Then, this model integrates the self-attention mechanism and confidence probabilities for the selection of the important words. Moreover, an alternative model similar to the transfer attack is introduced to reflect the correlation degree of words inside the texts. Finally, adversarial training experimental results demonstrate the superiority of the proposed model.

Original language	English
Article number	119110
Journal	Expert Systems with Applications
Volume	214
DOIs	https://doi.org/10.1016/j.eswa.2022.119110
State	Published - 15 Mar 2023

Keywords

Adversarial attack
Self-attention distribution
Text classification

Access to Document

10.1016/j.eswa.2022.119110

Cite this

@article{00a6c84617a0487c821e948ad745cc8f,

title = "Aliasing black box adversarial attack with joint self-attention distribution and confidence probability",

abstract = "Deep neural networks (DNNs) are vulnerable to adversarial attacks, in which a small perturbation to samples can cause misclassification. However, how to select important words for textual attack models is a big challenge. Therefore, in this paper, an innovative score-based attack model is proposed to solve the important words selection problem for textual attack models. To this end, the generation of semantically adversarial examples in this model is adopted to mislead a text classification model. Then, this model integrates the self-attention mechanism and confidence probabilities for the selection of the important words. Moreover, an alternative model similar to the transfer attack is introduced to reflect the correlation degree of words inside the texts. Finally, adversarial training experimental results demonstrate the superiority of the proposed model.",

keywords = "Adversarial attack, Self-attention distribution, Text classification",

author = "Jun Liu and Haoyu Jin and Guangxia Xu and Mingwei Lin and Tao Wu and Majid Nour and Fayadh Alenezi and Adi Alhudhaif and Kemal Polat",

note = "Publisher Copyright: {\textcopyright} 2022 Elsevier Ltd",

year = "2023",

month = mar,

day = "15",

doi = "10.1016/j.eswa.2022.119110",

language = "English",

volume = "214",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - Aliasing black box adversarial attack with joint self-attention distribution and confidence probability

AU - Liu, Jun

AU - Jin, Haoyu

AU - Xu, Guangxia

AU - Lin, Mingwei

AU - Wu, Tao

AU - Nour, Majid

AU - Alenezi, Fayadh

AU - Alhudhaif, Adi

AU - Polat, Kemal

PY - 2023/3/15

Y1 - 2023/3/15

N2 - Deep neural networks (DNNs) are vulnerable to adversarial attacks, in which a small perturbation to samples can cause misclassification. However, how to select important words for textual attack models is a big challenge. Therefore, in this paper, an innovative score-based attack model is proposed to solve the important words selection problem for textual attack models. To this end, the generation of semantically adversarial examples in this model is adopted to mislead a text classification model. Then, this model integrates the self-attention mechanism and confidence probabilities for the selection of the important words. Moreover, an alternative model similar to the transfer attack is introduced to reflect the correlation degree of words inside the texts. Finally, adversarial training experimental results demonstrate the superiority of the proposed model.

AB - Deep neural networks (DNNs) are vulnerable to adversarial attacks, in which a small perturbation to samples can cause misclassification. However, how to select important words for textual attack models is a big challenge. Therefore, in this paper, an innovative score-based attack model is proposed to solve the important words selection problem for textual attack models. To this end, the generation of semantically adversarial examples in this model is adopted to mislead a text classification model. Then, this model integrates the self-attention mechanism and confidence probabilities for the selection of the important words. Moreover, an alternative model similar to the transfer attack is introduced to reflect the correlation degree of words inside the texts. Finally, adversarial training experimental results demonstrate the superiority of the proposed model.

KW - Adversarial attack

KW - Self-attention distribution

KW - Text classification

UR - http://www.scopus.com/inward/record.url?scp=85141261103&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2022.119110

DO - 10.1016/j.eswa.2022.119110

M3 - Article

AN - SCOPUS:85141261103

SN - 0957-4174

VL - 214

JO - Expert Systems with Applications

JF - Expert Systems with Applications

M1 - 119110

ER -

Aliasing black box adversarial attack with joint self-attention distribution and confidence probability

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this