Knowledge distillation(KD) has been proven an effective method in intelligent edge computing and have achieved extensive study in recent deep learning research. However, when the teacher network is too stronger compared to the student network, the effect of knowledge distillation is not ideal. Aiming at resolving this problem, an improved method of knowledge distillation (TDKD) is proposed, which enables to transfer the complex mapping functions learned by cumbersome models to relatively simpler models. Firstly, the tucker-2 decomposition was performed on the convolutional layers of the original teacher model to reduce the capacity variance between the teacher network and student network. Then, the decomposed model will be used as a new teacher to participate in knowledge distillation for the student model. The experimental results show that the TDKD method can effectively solve the problem of poor distillation performance, which not only get better results if the KD method is effective, but also can reactivate the invalid KD method to some extents.
All Science Journal Classification (ASJC) codes