Speech Enhancement Using Dynamic Learning in Knowledge Distillation via Reinforcement Learning

Shih Chuan Chu, Chung Hsien Wu, Tsai Wei Su

研究成果: Article同行評審

摘要

In recent years, most of the research on speech enhancement (SE) has applied different strategies to improve performance through deep neural network models. However, as the performance improves, the memory resources and computational requirements of the model also increase, making it difficult to directly apply them to edge computing. Therefore, various model compression and acceleration techniques are desired. This paper proposes a learning method that dynamically uses Knowledge Distillation (KD) to teach a small student model from a large teacher model by considering the learning ratio from the teacher's output and the real target based on reinforcement learning (RL). During the KD learning process, RL is adopted to estimate the learning ratio by considering the reward favoring the hard target (clean speech) or the soft target (the output of the teacher model) during the training of KD. The proposed method results in a more stable training process for the resulting smaller SE model and yields improved performance. In the experiment, we used the TIMIT and CSTR VCTK datasets and evaluated two representative SE models that employ different loss functions. On the TIMIT dataset, when we reduced the number of parameters in the Wave-U-Net student model from 10.3 million to 2.6 million, our method performed better than non-KD models with improvements of 0.05 in PESQ, 0.1 in STOI, and 0.47 in the scale-invariant signal-to-distortion ratio. Moreover, by utilizing prior knowledge from the pre-trained teacher model, our method effectively guided the learning process of the student model, achieving excellent performance even under low SNR conditions. Furthermore, we use Conv-Tasnet to further validate our proposed method. Finally, for ease of comparison, we conducted a comparison on the VCTK dataset as well.

原文English
頁(從 - 到)144421-144434
頁數14
期刊IEEE Access
11
DOIs
出版狀態Published - 2023

All Science Journal Classification (ASJC) codes

  • 一般電腦科學
  • 一般材料科學
  • 一般工程

指紋

深入研究「Speech Enhancement Using Dynamic Learning in Knowledge Distillation via Reinforcement Learning」主題。共同形成了獨特的指紋。

引用此