Ensemble of One Model: Creating Model Variations for Transformer with Layer Permutation

Andrew Liaw, Jia Hao Hsu, Chung Hsien Wu

研究成果: Conference contribution

摘要

Ensemble involves combining the outputs of multiple models to increase performance. This technique has enjoyed great success across many fields in machine learning. This study focuses on a novel approach to increase performance of a model without any increase in number of parameters. The proposed approach involves training a model that can have different variations that perform well and different enough for ensemble. The variations are created by changing the order of the layers of a machine learning model. Moreover, this method can be combined with existing ensemble technique to further improve the performance. The task chosen for evaluating the performance is machine translation with Transformer, as Transformer is the current state-of-the-art model for this task as well as many natural language processing tasks. The IWSLT 2014 German to English and French to English datasets see an increase of at least 0.7 BLEU score over single model baseline with the same model size. When combined with multiple model ensemble, minimum increase of 0.3 BLEU is observed with no increase in parameters.

原文English
主出版物標題2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
頁面1026-1030
頁數5
ISBN(電子)9789881476890
出版狀態Published - 2021
事件2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Tokyo, Japan
持續時間: 2021 12月 142021 12月 17

出版系列

名字2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings

Conference

Conference2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
國家/地區Japan
城市Tokyo
期間21-12-1421-12-17

All Science Journal Classification (ASJC) codes

  • 人工智慧
  • 電腦視覺和模式識別
  • 訊號處理
  • 儀器

指紋

深入研究「Ensemble of One Model: Creating Model Variations for Transformer with Layer Permutation」主題。共同形成了獨特的指紋。

引用此