TY - GEN
T1 - Accurate audio-to-score alignment for expressive violin recordings
AU - Syue, Jia Ling
AU - Su, Li
AU - Lin, Yi Ju
AU - Li, Pei Ching
AU - Lu, Yen Kuang
AU - Wang, Yu Lin
AU - Su, Alvin W.Y.
PY - 2017/1/1
Y1 - 2017/1/1
N2 - An audio-to-score alignment system adaptive to various playing styles and techniques, and also with high accuracy for onset/offset annotation is the key step toward advanced research on automatic music expression analysis. Technical barriers include the processing of overlapped notes, repeated note sequences, and silence. Most of these characteristics vary with expressions. In this paper, the audio-toscore alignment problem of expressive violin performance is addressed. We propose a two-stage alignment system composed of the dynamic time warping (DTW) algorithm, simulation of overlapped sustain notes, background noise model, silence detection, and refinement process, to better capture the onset. More importantly, we utilize the nonnegative matrix factorization (NMF) method for synthesis of the reference signal in order to deal with highly diverse timbre in real-world performance. A dataset of annotated expressive violin recordings in which each piece is played with various expressive musical terms is used. The optimal choice of basic parameters considered in conventional alignment systems, such as features, distance functions in DTW, synthesis methods for the reference signal, and energy ratios, is analyzed. Different settings on different expressions are compared and discussed. Results show that the proposed methods notably improve the conventional DTW-based alignment method.
AB - An audio-to-score alignment system adaptive to various playing styles and techniques, and also with high accuracy for onset/offset annotation is the key step toward advanced research on automatic music expression analysis. Technical barriers include the processing of overlapped notes, repeated note sequences, and silence. Most of these characteristics vary with expressions. In this paper, the audio-toscore alignment problem of expressive violin performance is addressed. We propose a two-stage alignment system composed of the dynamic time warping (DTW) algorithm, simulation of overlapped sustain notes, background noise model, silence detection, and refinement process, to better capture the onset. More importantly, we utilize the nonnegative matrix factorization (NMF) method for synthesis of the reference signal in order to deal with highly diverse timbre in real-world performance. A dataset of annotated expressive violin recordings in which each piece is played with various expressive musical terms is used. The optimal choice of basic parameters considered in conventional alignment systems, such as features, distance functions in DTW, synthesis methods for the reference signal, and energy ratios, is analyzed. Different settings on different expressions are compared and discussed. Results show that the proposed methods notably improve the conventional DTW-based alignment method.
UR - http://www.scopus.com/inward/record.url?scp=85069915547&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85069915547&partnerID=8YFLogxK
M3 - Conference contribution
T3 - Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017
SP - 250
EP - 256
BT - Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017
A2 - Cunningham, Sally Jo
A2 - Duan, Zhiyao
A2 - Hu, Xiao
A2 - Turnbull, Douglas
PB - International Society for Music Information Retrieval
T2 - 18th International Society for Music Information Retrieval Conference, ISMIR 2017
Y2 - 23 October 2017 through 27 October 2017
ER -