TY - JOUR
T1 - Multiple change-point audio segmentation and classification using an MDL-based Gaussian model
AU - Wu, Chung Hsien
AU - Hsieh, Chia Hsin
N1 - Funding Information:
Manuscript received December 3, 2003; revised September 7, 2004. This work was supported by the National Science Council, Taiwan, R.O.C., under Contract NSC90-2213-E-006-088. The Associate Editor coordinating the review of this manuscript and approving it for publication was Dr. Geoffrey Zweig.
PY - 2006
Y1 - 2006
N2 - This study presents an approach for segmenting and classifying an audio stream based on audio type. First, a silence deletion procedure is employed to remove silence segments in the audio stream. A minimum description length (MDL)-based, Gaussian model is then proposed to statistically characterize the audio features. Audio segmentation segments the audio stream into a sequence of homogeneous subsegments using the MDL-based Gaussian model. A hierarchical threshold-based classifier is then used to classify each subsegment into different audio types. Finally, a heuristic method is adopted to smooth the subsegment sequence and provide the final segmentation and classification results. Experimental results indicate that for TDT-3 news broadcast, a missed detection rate (MDR) of 0.1 and a false alarm rate (FAR) of 0.14 were achieved for audio segmentation. Given the same MDR and FAR values, segment-based audio classification achieved a better classification accuracy of 88% compared to a clip-based approach.
AB - This study presents an approach for segmenting and classifying an audio stream based on audio type. First, a silence deletion procedure is employed to remove silence segments in the audio stream. A minimum description length (MDL)-based, Gaussian model is then proposed to statistically characterize the audio features. Audio segmentation segments the audio stream into a sequence of homogeneous subsegments using the MDL-based Gaussian model. A hierarchical threshold-based classifier is then used to classify each subsegment into different audio types. Finally, a heuristic method is adopted to smooth the subsegment sequence and provide the final segmentation and classification results. Experimental results indicate that for TDT-3 news broadcast, a missed detection rate (MDR) of 0.1 and a false alarm rate (FAR) of 0.14 were achieved for audio segmentation. Given the same MDR and FAR values, segment-based audio classification achieved a better classification accuracy of 88% compared to a clip-based approach.
UR - http://www.scopus.com/inward/record.url?scp=33947127409&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33947127409&partnerID=8YFLogxK
U2 - 10.1109/TSA.2005.852988
DO - 10.1109/TSA.2005.852988
M3 - Article
AN - SCOPUS:33947127409
SN - 1558-7916
VL - 14
SP - 647
EP - 657
JO - IEEE Transactions on Audio, Speech and Language Processing
JF - IEEE Transactions on Audio, Speech and Language Processing
IS - 2
ER -