What is it about?
Cross-domain image composition aims at seamlessly putting one or multiple user-specified objects into different visual scenes even if the objects come from different domains. We introduce a new framework dubbed TALE that leverages pretrained text-to-image diffusion models to tackle the challenge without the need for training.
Featured Image
Photo by Nahrizul Kadri on Unsplash
Why is it important?
Traditional methods often require training additional modules or finetuning diffusion models on specialized datasets, which can be costly and might not fully utilize the strengths of pre-trained diffusion models. Some recent approaches have tried to avoid these issues by finding ways to work without training, using attention maps to guide the image generation process indirectly. However, relying solely on attention maps for composition does not always yield desired results. These methods often struggle to preserve identity characteristics of input objects or exhibit limited background-to-object style adaptation in generated images. On the other hand, TALE is a new framework that operates directly on latent space to offer clear and effective guidance during the composition process, addressing these challenges. TALE incorporates two key mechanisms: Adaptive Latent Manipulation and Energy-guided Latent Optimization. The former formulates noisy latents using inverted background and foreground latents at a selective timestep conducive to initiating and steering the composition process. The latter complements the former by using specially designed energy functions to further optimize intermediate latents to refine the style of final results while remaining consistent with input prompts. Our experiments demonstrate that TALE surpasses prior baselines and attains state-of-the-art performance in image composition across various photorealistic and artistic domains.
Perspectives
Read the Original
This page is a summary of: TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization, October 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3664647.3681079.
You can read the full text:
Contributors
The following have contributed to this page