Masked Siamese Prompt Tuning for Few-Shot Natural Language Understanding

Shiwen Ni, Hung Yu Kao

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)

Abstract

Recently, prompt-based learning has shown excellent performance on few-shot scenarios. Using frozen language models to tune trainable continuous prompt embeddings has become a popular and powerful methodology. For few-shot natural language understanding, even if we freeze the parameters of the pretrained language model, the learned pseudo-prompts might still be overfitted. In this article, we propose a novel masked Siamese prompt tuning (MSP-tuning) to improve few-shot natural language understanding. Concretely, MSP-tuning masks randomly out part of the prompt tokens to get a pair of masked Siamese prompts for each sample. Each training sample is then fed to the model twice with the masked Siamese prompts. Finally, MSP-tuning minimizes the Jensen-Shannon-divergence (JSD) between the two output probability distributions of the pretrained language model to regularize the model further. Experiment results on the few-shot GLUE benchmark and SuperGLUE benchmark show that MSP-tuning outperforms previous approaches. Numerically, our MSP-tuning achieves an average improvement of 1.79% (BERT-base) and 1.39% (BERT-large) of the GLUE benchmark and 1.90% (RoBERTa-base) and 1.71% (RoBERTa-large) of the SuperGLUE benchmark compared to state-of-the-art method P-tuning. Our method facilitates applying large pretrained language models in natural language understanding.

Original languageEnglish
Pages (from-to)624-633
Number of pages10
JournalIEEE Transactions on Artificial Intelligence
Volume5
Issue number2
DOIs
Publication statusPublished - 2024 Feb 1

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Masked Siamese Prompt Tuning for Few-Shot Natural Language Understanding'. Together they form a unique fingerprint.

Cite this