Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text and control input align #46

Open
hongsukchoi opened this issue Dec 31, 2023 · 3 comments
Open

text and control input align #46

hongsukchoi opened this issue Dec 31, 2023 · 3 comments

Comments

@hongsukchoi
Copy link

Thank you for your great work!

I have a question about the controlnet extension. It seems the text is spatially aligend witt the latemt embeddings orginally from SD, but how is the spatially alilgn between text and geometric control (ex. scribble) done?

Reading througth the code here, I think there is no alignment between the text embeddings and geometric control embeddings. Am I right?

Thank you!

@lwchen6309
Copy link
Collaborator

Yes, you're right that there is no explicit alignment in Controlnet. What it does is just read the geometric control embeddings into features and add them to the SD intermediate feature.

@t00350320
Copy link

t00350320 commented Mar 11, 2024

hi, @lwchen6309 ,
by the way, i have another question:
your test codes in runner_inpait.py,
" input_prompt": "A digital painting of a half-frozen lake near mountains under a full moon and aurora. A boat is in the middle of the lake. Highly detailed.","
now i have printed the value of the color cross_attention_weight_64 corresponding to token="aurora" like this:

        [0.0000, 0.0000, 0.0000, 0.0000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000

so we guess the aurora's cross location will near upper right, same with token="full moon".
But why should we also put another mask image file pointing out the moon's real postion into latent space like

latent_model_input = torch.cat([latent_model_input, mask, masked_image_latents], dim=1)

will this be duplicated with previous color cross_attention_weight?
PTAL!
thank you !!!

@lwchen6309
Copy link
Collaborator

Hi, I think the image_mask is just to specify the region for inpainting. The object segmentation is still controlled by cross attention weight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants