Video reasoning for conflict events through feature extraction

Sheng Tzong Cheng, Chih Wei Hsu, Gwo Jiun Horng, Ci Ruei Jiang

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)


The rapid growth of multimedia data and the improvement of deep learning technology has allowed high-accuracy models to be trained for various fields. Video tools such as video classification, temporal action detection, and video summary are now available for the understanding of videos. In daily life, many social events start with a small conflict event. If conflicts and the subsequent dangers can be learned about from a video, we can prevent social incidents from occurring early on. This research presents a video and audio reasoning network that infers possible conflict events through video and audio features. To make the respective model more generalizable to other tasks, we have also added a predictive network to predict the risk of conflict events. We use multitasking to render the characteristics of movies and voices more generalizable to other similar tasks. We also propose several methods to integrate video features and audio features, improving the reasoning performance of the model. There’s a model we proposed is called the video and audio reasoning Network (VARN) which is more accurate than other models. Compared with RandomNet, it achieves a 2.9 times greater accuracy.

Original languageEnglish
Pages (from-to)6435-6455
Number of pages21
JournalJournal of Supercomputing
Issue number6
Publication statusPublished - 2021 Jun

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Software
  • Information Systems
  • Hardware and Architecture


Dive into the research topics of 'Video reasoning for conflict events through feature extraction'. Together they form a unique fingerprint.

Cite this