Natural language generation tasks such as machine translation summarization and chatbots usually rely on a given training corpus as a ground truth answer In recent years there has been some research on improving the generalization ability of natural language models to represent the meaning of a sentence in different ways These approaches are usually based on the sequence-to-sequence model One English sentence is inputted and output another English sentence with a different syntactic structure to express the same meaning according to the given control condition This task usually is called a syntactically controllable text generation task The difficulty of this problem is how to describe and utilize syntactic structural information effectively In this paper we propose a new hypothesis for sentences and their syntactic structures and two self-attention pre-training tasks based on this hypothesis This hypothesis aims to treat syntactic structures as very close to languages embeddings in cross-language translations We can consider the task of syntactically controllable text generation as translation between different language embeddings The two pre-training tasks are the Mono-syntax pre-training task which allows the model to understand a single syntactic structure and the Cross-syntax pre-training task which understands the differences between syntaxes on a set of back-translation sentences We evaluate the performance of our model on a manually written dataset The results show that our model is able to obtain higher scores on BLEU ROUGE and METEOR indices under automatic evaluation with smaller syntactic structure differences compared to other related studies In addition we further analyzed the effectiveness of pre-training the effect of different noise factors on the syntactic and semantic interactions the ablation study and the analysis of the sentences generated by our model
Date of Award | 2021 |
---|
Original language | English |
---|
Supervisor | Hung-Yu Kao (Supervisor) |
---|
Pre-training of Cross-syntax Language Model for Syntactically Controllable Text Generation
文傑, 蔡. (Author). 2021
Student thesis: Doctoral Thesis