TY - JOUR
T1 - Sound Events Recognition and Retrieval Using Multi-Convolutional-Channel Sparse Coding Convolutional Neural Networks
AU - Wang, Chien Yao
AU - Tai, Tzu Chiang
AU - Wang, Jia Ching
AU - Santoso, Andri
AU - Mathulaprangsan, Seksan
AU - Chiang, Chin Chin
AU - Wu, Chung Hsien
N1 - Funding Information:
Manuscript received March 18, 2019; revised July 25, 2019, October 9, 2019, November 25, 2019, and December 21, 2019; accepted December 23, 2019. Date of publication January 8, 2020; date of current version June 26, 2020. This work was supported in part by the Ministry of Science and Technology under Grants 108-2218-E-009-056, 108-2321-B-075-004-MY2, and 108-2634-F-008-004 through NCU and Pervasive Artificial Intelligence Research (PAIR) Labs, Taiwan. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Federico Fontana. (Corresponding author: Jia-Ching Wang.) C.-Y. Wang is with the Institute of Information Science, Academia Sinica, Taipei City 11529, Taiwan (e-mail: x102432003@yahoo.com.tw).
Publisher Copyright:
© 2014 IEEE.
PY - 2020
Y1 - 2020
N2 - This article proposes two novel deep convolutional neural networks (CNN), which are called the sparse coding convolutional neural network (SC-CNN) and the multi-convolutional-channel SC-CNN (MSC-CNN), to address the sound event recognition and retrieval problem. Unlike the general framework of a CNN, in which the feature learning process is performed hierarchically, the proposed framework models the whole memorization process in the human brain, including encoding, storage, and recollection. In particular, the MSC-CNN is designed to recognize multiple sound events that occur simultaneously. The experimental results indicate that the proposed SC-CNN and MSC-CNN outperforms the state-of-the-art systems in sound event recognition and retrieval.
AB - This article proposes two novel deep convolutional neural networks (CNN), which are called the sparse coding convolutional neural network (SC-CNN) and the multi-convolutional-channel SC-CNN (MSC-CNN), to address the sound event recognition and retrieval problem. Unlike the general framework of a CNN, in which the feature learning process is performed hierarchically, the proposed framework models the whole memorization process in the human brain, including encoding, storage, and recollection. In particular, the MSC-CNN is designed to recognize multiple sound events that occur simultaneously. The experimental results indicate that the proposed SC-CNN and MSC-CNN outperforms the state-of-the-art systems in sound event recognition and retrieval.
UR - http://www.scopus.com/inward/record.url?scp=85087459713&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85087459713&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2020.2964959
DO - 10.1109/TASLP.2020.2964959
M3 - Article
AN - SCOPUS:85087459713
SN - 2329-9290
VL - 28
SP - 1875
EP - 1887
JO - IEEE/ACM Transactions on Speech and Language Processing
JF - IEEE/ACM Transactions on Speech and Language Processing
M1 - 8952659
ER -