数据来源:HuggingFace Papers

Latest Papers

1. HOComp: Interaction-Aware Human-Object Composition

While existing image-guided composition methods may help insert a foreground object onto a user-specified region of a background image, achieving natural blending inside the region with the rest of the image unchanged, we observe that these existing methods often struggle in synthesizing seamless interaction-aware compositions when the task involves human-object interactions. In this paper, we first propose HOComp, a novel approach for compositing a foreground object onto a human-centric background image, while ensuring harmonious interactions between the foreground object and the background person and their consistent appearances. Our approach includes two key designs: (1) MLLMs-driven Region-based Pose Guidance (MRPG), which utilizes MLLMs to identify the interaction region as well as the interaction type (e.g., holding and lefting) to provide coarse-to-fine constraints to the generated pose for the interaction while incorporating human pose landmarks to track action variations and enforcing fine-grained pose constraints; and (2) Detail-Consistent Appearance Preservation (DCAP), which unifies a shape-aware attention modulation mechanism, a multi-view appearance loss, and a background consistency loss to ensure consistent shapes/textures of the foreground and faithful reproduction of the background human. We then propose the first dataset, named Interaction-aware Human-Object Composition (IHOC), for the task. Experimental results on our dataset show that HOComp effectively generates harmonious human-object interactions with consistent appearances, and outperforms relevant methods qualitatively and quantitatively.

中文摘要

虽然现有的图像引导合成方法可以帮助将前景对象插入到用户指定的背景图像区域中,并在该区域内自然融合,与背景图像的其余部分保持不变,但我们观察到这些现有方法在涉及人类对象交互的任务时往往难以合成无缝的交互感知合成。在本文中,我们首先提出了 HOComp,一种将前景对象合成到以人为中心的背景图像的新方法,同时确保前景对象与背景人物之间的和谐交互以及他们外观的一致性。我们的方法包括两个关键设计:(1)基于 MLLMs 的区域姿态指导(MRPG),利用 MLLMs 确定交互区域及其交互类型(例如,持有和抬起),为所生成的姿态提供粗到细的约束,同时结合人类姿态标记来跟踪行为变化并强制细粒度的姿态约束;(2)细节一致的外观保留(DCAP),统一了基于形状的注意力调制机制、多视角外观损失和背景一致性损失,以确保前景和背景人类的一致形状/纹理。然后我们提出了第一个数据集,名为交互感知人类-对象合成(IHOC),用于该任务。在我们的数据集上的实验结果表明,HOComp 有效生成和谐的人类-对象交互,并具有一致的外观,在定性和定量上都优于相关方法。

LLM Analysis

LLM Analysis Failed: Error: 抓取失败: Waiting failed: 30000ms exceeded

Authors: Dong Liang,Jinyuan Jia,Yuhao Liu,Rynson W. H. Lau

Categories: cs.CV

PDF URL: https://arxiv.org/pdf/2507.16813.pdf

Arxiv URL: https://arxiv.org/abs/2507.16813

Arxiv ID: 2507.16813

CoolPaper URL: https://papers.cool/arxiv/2507.16813

Published: 2025-07-22T17:59:21Z

Updated: 2025-07-22T17:59:21.000Z