跳至主導覽 跳至搜尋 跳過主要內容

GViG: Generative Visual Grounding Using Prompt-Based Language Modeling for Visual Question Answering

  • Yi Ting Li
  • , Ying Jia Lin
  • , Chia Jen Yeh
  • , Chun Yi Lin
  • , Hung Yu Kao

研究成果: Conference contribution

摘要

The WSDM 2023 Toloka VQA challenge introduces a new Grounding-based Visual Question Answering (GVQA) dataset, elevating multimodal task complexity. This challenge diverges from traditional VQA by requiring models to identify a bounding box in response to an image-question pair, aligning with Visual Grounding tasks. Existing VG approaches, when applied to GVQA, often necessitate external data or larger models for satisfactory results, leading to high computational demands. We approach this as a language modeling problem, utilizing prompt tuning with multiple state-of-the-art VQA models. Our method, operating solely on an NVIDIA RTX3090 GPU without external data, secured third place in the challenge, achieving an Intersection over Union (IoU) of 75.658. Our model notably provides explainability between textual and visual data through its attention mechanism, offering insights into its decision-making process. This research demonstrates that high performance in GVQA can be achieved with minimal resources, enhancing understanding of model dynamics and paving the way for improved interpretability and efficiency. Our code is available here: https://github.com/IKMLab/GViG.git

原文English
主出版物標題Advances in Knowledge Discovery and Data Mining - 28th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2024, Taipei, Taiwan, May 7–10, 2024, Proceedings
編輯De-Nian Yang, Xing Xie, Vincent S. Tseng, Jian Pei, Jen-Wei Huang, Jerry Chun-Wei Lin
發行者Springer Science and Business Media Deutschland GmbH
頁面83-94
頁數12
ISBN(列印)9789819722655
DOIs
出版狀態Published - 2024
事件28th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2024 - Taipei, Taiwan
持續時間: 2024 5月 72024 5月 10

出版系列

名字Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
14650 LNAI
ISSN(列印)0302-9743
ISSN(電子)1611-3349

Conference

Conference28th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2024
國家/地區Taiwan
城市Taipei
期間24-05-0724-05-10

All Science Journal Classification (ASJC) codes

  • 理論電腦科學
  • 一般電腦科學

指紋

深入研究「GViG: Generative Visual Grounding Using Prompt-Based Language Modeling for Visual Question Answering」主題。共同形成了獨特的指紋。

引用此