Geometrically-Aware Dual Transformer Encoding Visual and Textual Features for Image Captioning

Yu Ling Chang, Hao Shang Ma, Shiou Chi Li, Jen Wei Huang

研究成果: Conference contribution

摘要

When describing pictures from the point of view of human observers, the tendency is to prioritize eye-catching objects, link them to corresponding labels, and then integrate the results with background information (i.e., nearby objects or locations) to provide context. Most caption generation schemes consider the visual information of objects, while ignoring the corresponding labels, the setting, and/or the spatial relationship between the object and setting. This fails to exploit most of the useful information that the image might otherwise provide. In the current study, we developed a model that adds the object’s tags to supplement the insufficient information in visual object features, and established relationship between objects and background features based on relative and absolute coordinate information. We also proposed an attention architecture to account for all of the features in generating an image description. The effectiveness of the proposed Geometrically-Aware Dual Transformer Encoding Visual and Textual Features (GDVT) is demonstrated in experiment settings with and without pre-training.

原文English
主出版物標題Advances in Knowledge Discovery and Data Mining - 28th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2024, Proceedings
編輯De-Nian Yang, Xing Xie, Vincent S. Tseng, Jian Pei, Jen-Wei Huang, Jerry Chun-Wei Lin
發行者Springer Science and Business Media Deutschland GmbH
頁面15-27
頁數13
ISBN(列印)9789819722648
DOIs
出版狀態Published - 2024
事件28th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2024 - Taipei, Taiwan
持續時間: 2024 5月 72024 5月 10

出版系列

名字Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
14649 LNAI
ISSN(列印)0302-9743
ISSN(電子)1611-3349

Conference

Conference28th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2024
國家/地區Taiwan
城市Taipei
期間24-05-0724-05-10

All Science Journal Classification (ASJC) codes

  • 理論電腦科學
  • 一般電腦科學

指紋

深入研究「Geometrically-Aware Dual Transformer Encoding Visual and Textual Features for Image Captioning」主題。共同形成了獨特的指紋。

引用此