An efficient training process for recurrent neural network based on subsequence information

  • 黃 建勛

Student thesis: Doctoral Thesis

Abstract

Recurrent neural network is a neural network architecture suitable for analyzing sequence data The characteristic of this architecture is that each token in the sequence is input in order and the hidden state is calculated and kept inside the model The model can learn the correlation between the token Because it needs to wait for the calculation of the previous information during training this training process cannot be processed in parallel Therefore how to improve the training speed of recurrent neural networks has always been an important research topic In addition to the disadvantage that it cannot be processed in parallel in nature the length of each sequence is usually not the same when processing sequence data For example some sentences may only have three words but some sentences may have dozens of words A special character is usually used to pad each sequence in the dataset to the same length Thus a shorter sequence will contain a lot of useless information which will lead to a waste of computing resources This study proposes a method for training recurrent neural networks on various datasets using subsequences We found eight datasets in three major areas including images text and biological sequences for experiments By inputting different subsequences for training in each epoch we can use less training time to achieve the same test scores as when using the full sequence to train Then in this study the best sampling method is also proposed which can perform better in training time and test scores Finally we also prove that our method has good robustness by using different recurrent neural network units with this training method
Date of Award2020
Original languageEnglish
SupervisorTien-Hao Chang (Supervisor)

Cite this

'