Ensemble of One Model: Creating Model Variations for Transformer with Layer Permutation

Andrew Liaw, Jia Hao Hsu, Chung Hsien Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Ensemble involves combining the outputs of multiple models to increase performance. This technique has enjoyed great success across many fields in machine learning. This study focuses on a novel approach to increase performance of a model without any increase in number of parameters. The proposed approach involves training a model that can have different variations that perform well and different enough for ensemble. The variations are created by changing the order of the layers of a machine learning model. Moreover, this method can be combined with existing ensemble technique to further improve the performance. The task chosen for evaluating the performance is machine translation with Transformer, as Transformer is the current state-of-the-art model for this task as well as many natural language processing tasks. The IWSLT 2014 German to English and French to English datasets see an increase of at least 0.7 BLEU score over single model baseline with the same model size. When combined with multiple model ensemble, minimum increase of 0.3 BLEU is observed with no increase in parameters.

Original languageEnglish
Title of host publication2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1026-1030
Number of pages5
ISBN (Electronic)9789881476890
Publication statusPublished - 2021
Event2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Tokyo, Japan
Duration: 2021 Dec 142021 Dec 17

Publication series

Name2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings

Conference

Conference2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
Country/TerritoryJapan
CityTokyo
Period21-12-1421-12-17

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Instrumentation

Fingerprint

Dive into the research topics of 'Ensemble of One Model: Creating Model Variations for Transformer with Layer Permutation'. Together they form a unique fingerprint.

Cite this