TY - JOUR
T1 - DropAttack
T2 - A Random Dropped Weight Attack Adversarial Training for Natural Language Understanding
AU - Ni, Shiwen
AU - Li, Jiawen
AU - Yang, Min
AU - Kao, Hung Yu
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2024
Y1 - 2024
N2 - Adversarial training has been proven to be a powerful regularization technique to improve language models. In this work, we propose a novel random dropped weight attack adversarial training method (DropAttack) for natural language understanding. Our DropAttack improves the generalization of models by minimizing the internal adversarial risk caused by a multitude of attack combinations. Specifically, DropAttack enhances the adversarial attack space by intentionally adding worst-case adversarial perturbations to the weight parameters and randomly dropping the specific proportion of attack perturbations. To extensively validate the effectiveness of DropAttack, 12 public English natural language understanding datasets were used. Experiments on the GLUE benchmark show that when DropAttack is applied only to the finetuning stage, it is able to improve the overall test scores of the BERT-base pre-trained model from 78.3 to 79.7 and RoBERTa-large pre-trained model from 88.1 to 88.8. Further, DropAttack also significantly improves models trained from scratch. Theoretical analysis reveals that DropAttack performs potential gradient regularization on the input and weight parameters of the model. Moreover, visualization experiments show that DropAttack can push the minimum risk of the neural network to a lower and flatter loss landscape.
AB - Adversarial training has been proven to be a powerful regularization technique to improve language models. In this work, we propose a novel random dropped weight attack adversarial training method (DropAttack) for natural language understanding. Our DropAttack improves the generalization of models by minimizing the internal adversarial risk caused by a multitude of attack combinations. Specifically, DropAttack enhances the adversarial attack space by intentionally adding worst-case adversarial perturbations to the weight parameters and randomly dropping the specific proportion of attack perturbations. To extensively validate the effectiveness of DropAttack, 12 public English natural language understanding datasets were used. Experiments on the GLUE benchmark show that when DropAttack is applied only to the finetuning stage, it is able to improve the overall test scores of the BERT-base pre-trained model from 78.3 to 79.7 and RoBERTa-large pre-trained model from 88.1 to 88.8. Further, DropAttack also significantly improves models trained from scratch. Theoretical analysis reveals that DropAttack performs potential gradient regularization on the input and weight parameters of the model. Moreover, visualization experiments show that DropAttack can push the minimum risk of the neural network to a lower and flatter loss landscape.
UR - http://www.scopus.com/inward/record.url?scp=85177043799&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85177043799&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2023.3330613
DO - 10.1109/TASLP.2023.3330613
M3 - Article
AN - SCOPUS:85177043799
SN - 2329-9290
VL - 32
SP - 364
EP - 373
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
ER -