DropAttack: A Random Dropped Weight Attack Adversarial Training for Natural Language Understanding

Shiwen Ni, Jiawen Li, Min Yang, Hung Yu Kao

研究成果: Article同行評審

摘要

Adversarial training has been proven to be a powerful regularization technique to improve language models. In this work, we propose a novel random dropped weight attack adversarial training method (DropAttack) for natural language understanding. Our DropAttack improves the generalization of models by minimizing the internal adversarial risk caused by a multitude of attack combinations. Specifically, DropAttack enhances the adversarial attack space by intentionally adding worst-case adversarial perturbations to the weight parameters and randomly dropping the specific proportion of attack perturbations. To extensively validate the effectiveness of DropAttack, 12 public English natural language understanding datasets were used. Experiments on the GLUE benchmark show that when DropAttack is applied only to the finetuning stage, it is able to improve the overall test scores of the BERT-base pre-trained model from 78.3 to 79.7 and RoBERTa-large pre-trained model from 88.1 to 88.8. Further, DropAttack also significantly improves models trained from scratch. Theoretical analysis reveals that DropAttack performs potential gradient regularization on the input and weight parameters of the model. Moreover, visualization experiments show that DropAttack can push the minimum risk of the neural network to a lower and flatter loss landscape.

原文English
頁(從 - 到)364-373
頁數10
期刊IEEE/ACM Transactions on Audio Speech and Language Processing
32
DOIs
出版狀態Published - 2024

All Science Journal Classification (ASJC) codes

  • 電腦科學(雜項)
  • 聲學與超音波
  • 計算數學
  • 電氣與電子工程

指紋

深入研究「DropAttack: A Random Dropped Weight Attack Adversarial Training for Natural Language Understanding」主題。共同形成了獨特的指紋。

引用此