This repository contains the implementation of our method for estimating correspondences with Stable Diffusion in an unsupervised manner. Code for getting the attention maps has been modified from Prompt-to-Prompt and the dataloader for the Spair-71k and PFWillow datasets has been modified from Cats++. Our new method surpasses weakly supervised methods and closes the gap to strongly supervised methods.
Here are instructions on how to run the repository:
- Install dependencies: This project uses a conda environment for managing dependencies. You can create the environment and install all dependencies with the following command:
conda env create -f environment.yml
- Run the evaluation script:
conda activate LDM_correspondences python3 -m eval.eval
- More options can be found with
python3 -m eval.eval --help
The project includes an interactive local website for visualizing attention maps associated with identified correspondences. Follow the steps below to launch the visualization:
-
Activate the conda environment and run the evaluation script with visualization:
conda activate LDM_correspondences python3 -m eval.eval --visualize
-
Launch the interactive website by running the visualization script:
python3 -m clickable_lines.app
This will display correspondences. Click on each to visualize the corresponding attention maps.
We supervise the attention maps corresponding to randomly initialized text embedding to activate in a source region. This text embedding can then be applied to any target image where we simply look for the argmax in its attention map.
We are motivated by the fact that the attention maps for specific words act as pseudo-segmentation for those regions. By inputting an image instead of random noise we can use Stable Diffusion for inference tasks.
We find that even when our method predicts incorrect correspondences, the regions it predicts still seem reasonable. On the bottom right, of note, even though all points are meant to correspond with the wine bottle, points occluded by the wine glass instead map to the wine glass.
Our method outperforms weakly supervised methods and in the case of PF-Willow, is on par with strongly supervised methods.
We also find that when we look for correspondences between different classes, it still estimates plausible correspondences.
If you find this code useful for your research please consider citing the following paper:
@article{hedlin2023unsupervised,
title={Unsupervised Semantic Correspondence Using Stable Diffusion},
author={Eric Hedlin and Gopal Sharma and Shweta Mahajan and Hossam Isack and Abhishek Kar and Andrea Tagliasacchi and Kwang Moo Yi},
year={2023},
eprint={2305.15581},
archivePrefix={arXiv},
primaryClass={cs.CV}
}