TY - GEN
T1 - Knowledge Distillation on Extractive Summarization
AU - Lin, Ying Jia
AU - Tan, Daniel
AU - Chou, Tzu Hsuan
AU - Kao, Hung Yu
AU - Wang, Hsin Yang
N1 - Funding Information:
This work was partially supported by NCKU-B109-K003, which is a collaboration between National Cheng Kung University, Taiwan, and SoftBank Corp., Tokyo.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/12
Y1 - 2020/12
N2 - Large-scale pre-trained frameworks have shown state-of-the-art performance in several natural language processing tasks. However, the costly training and inference time are great challenges when deploying such models to real-world applications. In this work, we conduct an empirical study of knowledge distillation on an extractive text summarization task. We first utilized a pre-trained model as the teacher model for extractive summarization and extracted learned knowledge from it as soft targets. Then, we leveraged both the hard targets and the soft targets as the objective for training a much smaller student model to perform extractive summarization. Our results show the student model performs only 1 point lower in the three ROUGE scores on the CNN/DM dataset of extractive summarization while being 40% smaller than the teacher model and 50% faster in terms of the inference time.
AB - Large-scale pre-trained frameworks have shown state-of-the-art performance in several natural language processing tasks. However, the costly training and inference time are great challenges when deploying such models to real-world applications. In this work, we conduct an empirical study of knowledge distillation on an extractive text summarization task. We first utilized a pre-trained model as the teacher model for extractive summarization and extracted learned knowledge from it as soft targets. Then, we leveraged both the hard targets and the soft targets as the objective for training a much smaller student model to perform extractive summarization. Our results show the student model performs only 1 point lower in the three ROUGE scores on the CNN/DM dataset of extractive summarization while being 40% smaller than the teacher model and 50% faster in terms of the inference time.
UR - http://www.scopus.com/inward/record.url?scp=85102397375&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102397375&partnerID=8YFLogxK
U2 - 10.1109/AIKE48582.2020.00019
DO - 10.1109/AIKE48582.2020.00019
M3 - Conference contribution
AN - SCOPUS:85102397375
T3 - Proceedings - 2020 IEEE 3rd International Conference on Artificial Intelligence and Knowledge Engineering, AIKE 2020
SP - 71
EP - 76
BT - Proceedings - 2020 IEEE 3rd International Conference on Artificial Intelligence and Knowledge Engineering, AIKE 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 3rd IEEE International Conference on Artificial Intelligence and Knowledge Engineering, AIKE 2020
Y2 - 9 December 2020 through 11 December 2020
ER -