TY - GEN
T1 - Generative and discriminative modeling toward semantic context detection in audio tracks
AU - Chu, Wei Ta
AU - Cheng, Wen Huang
AU - Wu, Ja Ling
PY - 2005
Y1 - 2005
N2 - Semantic-level content analysis is a crucial issue to achieve efficient content retrieval and management. We propose a hierarchical approach that models the statistical characteristics of several audio events over a time series to accomplish semantic context detection. Two stages, including audio event and semantic context modeling/testing, are devised to bridge the semantic gap between physical audio features and semantic concepts. For action movies we focused in this work, hidden Markov models (HMMs) are used to model four representative audio events, i.e. gunshot, explosion, car-braking, and engine sounds. At the semantic context level, generative (ergodic hidden Markov model) and discriminative (support vector machine, SVM) approaches are investigated to fuse the characteristics and correlations among various audio events, which provide cues for detecting gunplay and car-chasing scenes. The experimental results demonstrate the effectiveness of the proposed approaches and draw a sketch for semantic indexing and retrieval. Moreover, the differences between two fusion schemes are discussed to be the reference for future research.
AB - Semantic-level content analysis is a crucial issue to achieve efficient content retrieval and management. We propose a hierarchical approach that models the statistical characteristics of several audio events over a time series to accomplish semantic context detection. Two stages, including audio event and semantic context modeling/testing, are devised to bridge the semantic gap between physical audio features and semantic concepts. For action movies we focused in this work, hidden Markov models (HMMs) are used to model four representative audio events, i.e. gunshot, explosion, car-braking, and engine sounds. At the semantic context level, generative (ergodic hidden Markov model) and discriminative (support vector machine, SVM) approaches are investigated to fuse the characteristics and correlations among various audio events, which provide cues for detecting gunplay and car-chasing scenes. The experimental results demonstrate the effectiveness of the proposed approaches and draw a sketch for semantic indexing and retrieval. Moreover, the differences between two fusion schemes are discussed to be the reference for future research.
UR - http://www.scopus.com/inward/record.url?scp=84455205988&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84455205988&partnerID=8YFLogxK
U2 - 10.1109/MMMC.2005.42
DO - 10.1109/MMMC.2005.42
M3 - Conference contribution
AN - SCOPUS:84455205988
SN - 0769521649
SN - 9780769521640
T3 - Proceedings of the 11th International Multimedia Modelling Conference, MMM 2005
SP - 38
EP - 45
BT - Proceedings of the 11th International Multimedia Modelling Conference, MMM 2005
T2 - 11th International Multimedia Modelling Conference, MMM 2005
Y2 - 12 January 2005 through 14 January 2005
ER -