跳至主導覽 跳至搜尋 跳過主要內容

IP-Prompter: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting

  • Yuxin Zhang
  • , Minyan Luo
  • , Weiming Dong
  • , Xiao Yang
  • , Haibin Huang
  • , Chongyang Ma
  • , Oliver Deussen
  • , Tong Yee Lee
  • , Changsheng Xu

研究成果: Conference contribution

1   連結會在新分頁中開啟 引文 斯高帕斯(Scopus)

摘要

The stories and characters that captivate us as we grow up shape unique fantasy worlds, with images serving as the primary medium for visually experiencing these realms. Personalizing generative models through finetuning with theme-specific data has become a prevalent approach in text-to-image generation. However, unlike object customization, which focuses on learning specific objects, theme-specific generation encompasses diverse elements such as characters, scenes, and objects. Such diversity also introduces a key challenge: how to adaptively generate multi-character, multi-concept, and continuous theme-specific images (TSI). Moreover, finetuning approaches often come with significant computational overhead, time costs, and risks of overfitting. This paper explores a fundamental question: Can image generation models directly leverage images as contextual input, similarly to how large language models use text as context? To address this, we present IP-Prompter, a novel training-free TSI generation method. IP-Prompter introduces visual prompting, a mechanism that integrates reference images into generative models, allowing users to seamlessly specify the target theme without requiring additional training. To further enhance this process, we propose a Dynamic Visual Prompting (DVP) mechanism, which iteratively optimizes visual prompts to improve the accuracy and quality of generated images. Our approach enables diverse applications, including consistent story generation, character design, realistic character generation, and style-guided image generation. Comparative evaluations against state-of-the-art personalization methods demonstrate that IP-Prompter achieves significantly better results and excels in maintaining character identity preserving, style consistency and text alignment, offering a robust and flexible solution for theme-specific image generation. Our project page: https://ip-prompter.github.io/.

原文English
主出版物標題Proceedings - SIGGRAPH 2025 Conference Papers
編輯Stephen N. Spencer
發行者Association for Computing Machinery, Inc
ISBN(電子)9798400715402
DOIs
出版狀態Published - 2025 7月 27
事件SIGGRAPH 2025 Conference Papers - Vancouver, Canada
持續時間: 2025 8月 102025 10月 14

出版系列

名字Proceedings - SIGGRAPH 2025 Conference Papers

Conference

ConferenceSIGGRAPH 2025 Conference Papers
國家/地區Canada
城市Vancouver
期間25-08-1025-10-14

All Science Journal Classification (ASJC) codes

  • 電腦科學應用
  • 計算機理論與數學
  • 人工智慧
  • 數學物理學

引用此