TY - JOUR
T1 - A tucker decomposition based knowledge distillation for intelligent edge applications
AU - Dai, Cheng
AU - Liu, Xingang
AU - Li, Zhuolin
AU - Chen, Mu Yen
N1 - Funding Information:
This work is supported by the National Key Research and Development Project under grant 2018YFB17002402 , National Natural Science Foundation of China under grant 61872404 , and the Applied Basic Research Key Programs of Science and Technology Department of Sichuan Province under the grant 2018JY0023 .
PY - 2021/3
Y1 - 2021/3
N2 - Knowledge distillation(KD) has been proven an effective method in intelligent edge computing and have achieved extensive study in recent deep learning research. However, when the teacher network is too stronger compared to the student network, the effect of knowledge distillation is not ideal. Aiming at resolving this problem, an improved method of knowledge distillation (TDKD) is proposed, which enables to transfer the complex mapping functions learned by cumbersome models to relatively simpler models. Firstly, the tucker-2 decomposition was performed on the convolutional layers of the original teacher model to reduce the capacity variance between the teacher network and student network. Then, the decomposed model will be used as a new teacher to participate in knowledge distillation for the student model. The experimental results show that the TDKD method can effectively solve the problem of poor distillation performance, which not only get better results if the KD method is effective, but also can reactivate the invalid KD method to some extents.
AB - Knowledge distillation(KD) has been proven an effective method in intelligent edge computing and have achieved extensive study in recent deep learning research. However, when the teacher network is too stronger compared to the student network, the effect of knowledge distillation is not ideal. Aiming at resolving this problem, an improved method of knowledge distillation (TDKD) is proposed, which enables to transfer the complex mapping functions learned by cumbersome models to relatively simpler models. Firstly, the tucker-2 decomposition was performed on the convolutional layers of the original teacher model to reduce the capacity variance between the teacher network and student network. Then, the decomposed model will be used as a new teacher to participate in knowledge distillation for the student model. The experimental results show that the TDKD method can effectively solve the problem of poor distillation performance, which not only get better results if the KD method is effective, but also can reactivate the invalid KD method to some extents.
UR - http://www.scopus.com/inward/record.url?scp=85098721174&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098721174&partnerID=8YFLogxK
U2 - 10.1016/j.asoc.2020.107051
DO - 10.1016/j.asoc.2020.107051
M3 - Article
AN - SCOPUS:85098721174
VL - 101
JO - Applied Soft Computing
JF - Applied Soft Computing
SN - 1568-4946
M1 - 107051
ER -