DropAttack: A Random Dropped Weight Attack Adversarial Training for Natural Language Understanding

Shiwen Ni, Jiawen Li, Min Yang, Hung Yu Kao

Research output: Contribution to journalArticlepeer-review

Abstract

Adversarial training has been proven to be a powerful regularization technique to improve language models. In this work, we propose a novel random dropped weight attack adversarial training method (DropAttack) for natural language understanding. Our DropAttack improves the generalization of models by minimizing the internal adversarial risk caused by a multitude of attack combinations. Specifically, DropAttack enhances the adversarial attack space by intentionally adding worst-case adversarial perturbations to the weight parameters and randomly dropping the specific proportion of attack perturbations. To extensively validate the effectiveness of DropAttack, 12 public English natural language understanding datasets were used. Experiments on the GLUE benchmark show that when DropAttack is applied only to the finetuning stage, it is able to improve the overall test scores of the BERT-base pre-trained model from 78.3 to 79.7 and RoBERTa-large pre-trained model from 88.1 to 88.8. Further, DropAttack also significantly improves models trained from scratch. Theoretical analysis reveals that DropAttack performs potential gradient regularization on the input and weight parameters of the model. Moreover, visualization experiments show that DropAttack can push the minimum risk of the neural network to a lower and flatter loss landscape.

Original languageEnglish
Pages (from-to)364-373
Number of pages10
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume32
DOIs
Publication statusPublished - 2024

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'DropAttack: A Random Dropped Weight Attack Adversarial Training for Natural Language Understanding'. Together they form a unique fingerprint.

Cite this