Video reasoning for conflict events through feature extraction

Sheng Tzong Cheng, Chih Wei Hsu, Gwo Jiun Horng, Ci Ruei Jiang

研究成果: Article同行評審


The rapid growth of multimedia data and the improvement of deep learning technology has allowed high-accuracy models to be trained for various fields. Video tools such as video classification, temporal action detection, and video summary are now available for the understanding of videos. In daily life, many social events start with a small conflict event. If conflicts and the subsequent dangers can be learned about from a video, we can prevent social incidents from occurring early on. This research presents a video and audio reasoning network that infers possible conflict events through video and audio features. To make the respective model more generalizable to other tasks, we have also added a predictive network to predict the risk of conflict events. We use multitasking to render the characteristics of movies and voices more generalizable to other similar tasks. We also propose several methods to integrate video features and audio features, improving the reasoning performance of the model. There’s a model we proposed is called the video and audio reasoning Network (VARN) which is more accurate than other models. Compared with RandomNet, it achieves a 2.9 times greater accuracy.

頁(從 - 到)6435-6455
期刊Journal of Supercomputing
出版狀態Published - 2021 六月

All Science Journal Classification (ASJC) codes

  • 軟體
  • 理論電腦科學
  • 資訊系統
  • 硬體和架構


深入研究「Video reasoning for conflict events through feature extraction」主題。共同形成了獨特的指紋。