Fully complex deep neural network for phase-incorporating monaural source separation

Yuan Shan Lee, Chien Yao Wang, Shu Fan Wang, Jia Ching Wang, Chung-Hsien Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

Deep neural network (DNN) have become a popular means of separating a target source from a mixed signal. Most of DNN-based methods modify only the magnitude spectrum of the mixture. The phase spectrum is left unchanged, which is inherent in the short-time Fourier transform (STFT) coefficients of the input signal. However, recent studies have revealed that incorporating phase information can improve the quality of separated sources. To estimate simultaneously the magnitude and the phase of STFT coefficients, this work paper developed a fully complex-valued deep neural network (FCDNN) that learns the nonlinear mapping from complex-valued STFT coefficients of a mixture to sources. In addition, to reinforce the sparsity of the estimated spectra, a sparse penalty term is incorporated into the objective function of the FCDNN. Finally, the proposed method is applied to singing source separation. Experimental results indicate that the proposed method outperforms the state-of-the-art DNN-based methods.

Original languageEnglish
Title of host publication2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages281-285
Number of pages5
ISBN (Electronic)9781509041176
DOIs
Publication statusPublished - 2017 Jun 16
Event2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - New Orleans, United States
Duration: 2017 Mar 52017 Mar 9

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017
CountryUnited States
CityNew Orleans
Period17-03-0517-03-09

Fingerprint

Source separation
Fourier transforms
Deep neural networks

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Lee, Y. S., Wang, C. Y., Wang, S. F., Wang, J. C., & Wu, C-H. (2017). Fully complex deep neural network for phase-incorporating monaural source separation. In 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings (pp. 281-285). [7952162] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2017.7952162
Lee, Yuan Shan ; Wang, Chien Yao ; Wang, Shu Fan ; Wang, Jia Ching ; Wu, Chung-Hsien. / Fully complex deep neural network for phase-incorporating monaural source separation. 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 281-285 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).
@inproceedings{82fc0fb18a8245c19cf697c0e8379892,
title = "Fully complex deep neural network for phase-incorporating monaural source separation",
abstract = "Deep neural network (DNN) have become a popular means of separating a target source from a mixed signal. Most of DNN-based methods modify only the magnitude spectrum of the mixture. The phase spectrum is left unchanged, which is inherent in the short-time Fourier transform (STFT) coefficients of the input signal. However, recent studies have revealed that incorporating phase information can improve the quality of separated sources. To estimate simultaneously the magnitude and the phase of STFT coefficients, this work paper developed a fully complex-valued deep neural network (FCDNN) that learns the nonlinear mapping from complex-valued STFT coefficients of a mixture to sources. In addition, to reinforce the sparsity of the estimated spectra, a sparse penalty term is incorporated into the objective function of the FCDNN. Finally, the proposed method is applied to singing source separation. Experimental results indicate that the proposed method outperforms the state-of-the-art DNN-based methods.",
author = "Lee, {Yuan Shan} and Wang, {Chien Yao} and Wang, {Shu Fan} and Wang, {Jia Ching} and Chung-Hsien Wu",
year = "2017",
month = "6",
day = "16",
doi = "10.1109/ICASSP.2017.7952162",
language = "English",
series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "281--285",
booktitle = "2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings",
address = "United States",

}

Lee, YS, Wang, CY, Wang, SF, Wang, JC & Wu, C-H 2017, Fully complex deep neural network for phase-incorporating monaural source separation. in 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings., 7952162, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 281-285, 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017, New Orleans, United States, 17-03-05. https://doi.org/10.1109/ICASSP.2017.7952162

Fully complex deep neural network for phase-incorporating monaural source separation. / Lee, Yuan Shan; Wang, Chien Yao; Wang, Shu Fan; Wang, Jia Ching; Wu, Chung-Hsien.

2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. p. 281-285 7952162 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Fully complex deep neural network for phase-incorporating monaural source separation

AU - Lee, Yuan Shan

AU - Wang, Chien Yao

AU - Wang, Shu Fan

AU - Wang, Jia Ching

AU - Wu, Chung-Hsien

PY - 2017/6/16

Y1 - 2017/6/16

N2 - Deep neural network (DNN) have become a popular means of separating a target source from a mixed signal. Most of DNN-based methods modify only the magnitude spectrum of the mixture. The phase spectrum is left unchanged, which is inherent in the short-time Fourier transform (STFT) coefficients of the input signal. However, recent studies have revealed that incorporating phase information can improve the quality of separated sources. To estimate simultaneously the magnitude and the phase of STFT coefficients, this work paper developed a fully complex-valued deep neural network (FCDNN) that learns the nonlinear mapping from complex-valued STFT coefficients of a mixture to sources. In addition, to reinforce the sparsity of the estimated spectra, a sparse penalty term is incorporated into the objective function of the FCDNN. Finally, the proposed method is applied to singing source separation. Experimental results indicate that the proposed method outperforms the state-of-the-art DNN-based methods.

AB - Deep neural network (DNN) have become a popular means of separating a target source from a mixed signal. Most of DNN-based methods modify only the magnitude spectrum of the mixture. The phase spectrum is left unchanged, which is inherent in the short-time Fourier transform (STFT) coefficients of the input signal. However, recent studies have revealed that incorporating phase information can improve the quality of separated sources. To estimate simultaneously the magnitude and the phase of STFT coefficients, this work paper developed a fully complex-valued deep neural network (FCDNN) that learns the nonlinear mapping from complex-valued STFT coefficients of a mixture to sources. In addition, to reinforce the sparsity of the estimated spectra, a sparse penalty term is incorporated into the objective function of the FCDNN. Finally, the proposed method is applied to singing source separation. Experimental results indicate that the proposed method outperforms the state-of-the-art DNN-based methods.

UR - http://www.scopus.com/inward/record.url?scp=85023763758&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85023763758&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2017.7952162

DO - 10.1109/ICASSP.2017.7952162

M3 - Conference contribution

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 281

EP - 285

BT - 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Lee YS, Wang CY, Wang SF, Wang JC, Wu C-H. Fully complex deep neural network for phase-incorporating monaural source separation. In 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2017. p. 281-285. 7952162. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2017.7952162