Automatic text summary generation is one of the classic tasks in the field of natural language processing The main task is to generate a shorter version based on the original text and maintain a balance between text length and text information The technology of automatic text summarization can help people to browse and extract useful information and has a broad application prospect in the explosion of Internet information The technology of automatic summarization also involves the research of sentence representation text compression language model and other related technologies In this thesis two methods are used to improve the existing models: (1) This paper proposes a method based on the topic model which implies the topic model guiding the attention mechanism The topic model is considered to be a useful tool for shallow semantic modeling Based on the previous work this paper studies the role of the topic model in supervised automatic summarization tasks The potential words of the original text are selected as the summary may be words that are more relevant to the text's topic We compute the current attention distribution along with the decoder based on the topic embedding of each text to generate a summary that is more relevant to the topic of the text (2) This paper proposes a method of extracting features using a pre-trained BERT model as an encoder This two-stage training method of migration learning has been widely used in NLP tasks recently Because BERT has learned general linguistic knowledge from massive unsupervised text downstream tasks can benefit from the prior information of pre-trained language models Experiments show that using BERT as the encoder can effectively improve the quality of the text representation of the encoder than the traditional LSTM model
Date of Award | 2019 |
---|
Original language | English |
---|
Supervisor | Hung-Yu Kao (Supervisor) |
---|
Topic-Aware Abstractive Summarization with Joint Training Topic Model
夢昀, 王. (Author). 2019
Student thesis: Doctoral Thesis