-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Paint with words #4406
Comments
That looks very interesting. It basically allows you to compose an entire image with colored masks. |
The examples look like absolute dogshit, I'm going to be very disappointed if that turns out of be a good demo of a bad system rather than the opposite. Interesting though. |
The results from the paper look much better. |
Yes so much more drastically better and cohesive it makes me doubt that a version against SD is going to be effective. |
Even if those 2 results look slightly weird, it still works somewhat (even comparable to make-a-scene). However, there's the question of how you would draw a mask assigned to words for this in gradio. |
There's a colour canvas drawing facility in gradio it's just disabled by default as it was breaking layouts and generally misbehaving. from there you can get the unique colours and ask for text tagging, the required new callback hooks into the model are going to need a very convincing and powerful results though. |
The current implementation is definitely not working how we'd expect it to (check "Comparison" section at the bottom of this comment). I've uploaded some results if anyone wants to experiment with me.
The implementation in the paper was found empirically, so it's likely we can also find a good configuration by simply playing around. ComparisonInput Image Colors/TokensEXAMPLE_SETTING_1 = {
"color_context": {
( 48, 167, 26): "purple trees,1.0",
(115, 232, 103): "abandoned city,1.0",
(100, 121, 135): "road,1.0",
(133, 94, 253): "grass,1.0",
( 1, 47, 71): "magical portal,1.0",
( 38, 192, 212): "starry night,1.0",
},
"color_map_img_path": "benchmark/unlabeled/A dramatic oil painting of a road.png",
"input_prompt": "A dramatic oil painting of a road from a magical portal to an abandoned city with purple trees and grass in a starry night.",
"output_dir_path": "benchmark/example_1",
}
EXAMPLE_SETTING_2 = {
"color_context": {
(161, 160, 173): "A large red moon,1.0",
( 79, 18, 96): "Bats,1.0",
( 82, 170, 20): "sky,1.0",
( 0, 232, 126): "an evil pumpkin,1.0",
(180, 0, 137): "zombies,1.0",
(129, 65, 0): "tombs,1.0",
},
"color_map_img_path": "benchmark/unlabeled/A Halloween scene of an evil pumpkin.png",
"input_prompt": "A Halloween scene of an evil pumpkin. A large red moon in the sky. Bats are flying and zombies are walking out of tombs. Highly detailed fantasy art.",
"output_dir_path": "benchmark/example_2",
}
EXAMPLE_SETTING_3 = {
"color_context": {
( 7, 192, 152): "dark cellar,1.0",
( 81, 31, 97): "monster,1.0",
( 71, 132, 2): "teddy bear,1.0",
( 32, 115, 189): "table,1.0",
( 70, 53, 108): "dungeons and dragons,1.0",
},
"color_map_img_path": "benchmark/unlabeled/A monster and a teddy brear playing dungeons and dragons.png",
"input_prompt": "A monster and a teddy bear playing dungeons and dragons around a table in a dark cellar. High quality fantasy art.",
"output_dir_path": "benchmark/example_3",
}
EXAMPLE_SETTING_4 = {
"color_context": {
(138, 48, 39): "rabbit mage,1.0",
( 50, 32, 211): "fire ball,1.0",
(126, 200, 100): "clouds,1.0",
},
"color_map_img_path": "benchmark/unlabeled/A rabbit mage standing on clouds casting a fireball.png",
"input_prompt": "A highly detailed digital art of a rabbit mage standing on clouds casting a fire ball.",
"output_dir_path": "benchmark/example_4",
}
EXAMPLE_SETTING_5 = {
"color_context": {
(157, 187, 242): "rainbow beams,1.0",
( 27, 165, 234): "forest,1.0",
( 57, 244, 30): "A red Ferrari car,1.0",
(151, 138, 41): "gravel road,1.0",
},
"color_map_img_path": "benchmark/unlabeled/A red Ferrari car driving on a gravel road.png",
"input_prompt": "A red Ferrari car driving on a gravel road in a forest with rainbow beams in the distance.",
"output_dir_path": "benchmark/example_5",
}
EXAMPLE_SETTING_6 = {
"color_context": {
(123, 141, 146): "bar,1.0",
( 90, 119, 35): "red boxing gloves,1.0",
( 48, 167, 26): "blue boxing gloves,1.0",
( 10, 216, 129): "A squirrel,1.0",
( 72, 38, 31): "a squirrel,1.0",
},
"color_map_img_path": "benchmark/unlabeled/A squirrel and a squirrel with boxing gloves fighting in a bar.png",
"input_prompt": "A squirrel with red boxing gloves and a squirrel with blue boxing gloves fighting in a bar.",
"output_dir_path": "benchmark/example_6",
} |
Here's a more direct comparison. (3rd column is stable-diffusion+paint_with_words, 4th column is stable-diffusion) I can definitely see merit in adding this feature.
(Also, these samples were done without any weighting at all for prompt tokens or paint-with-words tokens. If you reduce the weight of background elements and increase the weight of unique elements I'm sure it'll work even better.) |
Here is a solution, but you need to rewrite it to integrate or for extension |
@mykeehu It was linked in the initial issue description. |
Wow @CookiePPP thank you so much for the comparisons! I would be so happy if you let me use your benchmarks on my paint-with-words repo as well, do you mind if I do? |
@cloneofsimo |
Thank you so much! I will certainly credit you when I add some of these materials! |
@cloneofsimo |
Awesome! Thank you so much for sharing that information as well! I agree 100% that certainly much better configuration could be possible as the model structure differs with eDiffi. |
@CookiePPP Based on your findings, repo I've added user-defined weight scaling function as well as some findings in my repo. I hope this feature gets added to A1111's repo as well |
I would also be happy if someone wrote it as an extension, because so many models, so many label sets and styles, that would really expand the possibilities of SD. I suppose the interesting thing would be how to do the color label table, because it has to be passed to the generator, the simple prompt would not be good. I also wonder how much can be solved with scripts only, as in the case of multiprompt scripts. Unfortunately I'm not a programmer to rewrite it, I'm just looking forward to it and brainstorming. |
The eDiffi ones are painfully good, Nvidia simply has an advantage: they have near unlimited GPU processing power and it only costs them a bit of energy to use it. |
One idea for improvement: So if the D&D table or the blue gloves are wrong -> reroll only the noise in that area and keep the other noise zones identical. |
Well, eDiffi does use literally three different conditionings and something like 5 times more parameters, so... Just in case anyone is interested, there are LOT of room for improvements in my implementation
If you are going to rebuild this feature into A1111's, there are probably better ways to construct cross attention weights than mine. (I will be working to get it right eventually though, my repo got way more attention than it deserves lol) |
I think this is very good idea. I think it might work + easy to implement. If this works i'll add this feature in the future as well |
I can help with implementing an extension - I just need instructions on what methods would need to changed. I have some ML literacy so just knowing about where the interaction occurs would be enough for me to try an implementation with LDM |
I'd love to know what the status of the development of this extension is, because I'm really looking forward to it! |
I've tried to collect what needs to be done here, but I'm still trying to understand how webUI and all the related tech works, so this is probably just a vague draft. Can someone fix this/extend upon please?
|
@nistvan86 an alternative solution is what the openOutpaint extension does, that it uses a standalone interface through the api and then it is not tied to gradio. |
@mykeehu an even better solution would be to avoid typing the same prompt elements multiple times and attach the color codes to sections of the prompt with a special syntax, like how you could currently But I'm not sure how well that could be implemented UX wise. One solution could be for example to select a part of the prompt and the drag & drop it onto a colored shape on the canvas. |
Or when you choose a colour and start painting with it, the colour is added to the list. But how can the colour be changed afterwards? Or if a color is not needed, delete it, so you can manage layers. |
The more I think about it, a vector graphics based editor would probably serve the needs better here. An SVG for example could be shown in the browser easily, it's DOM can be interacted with, the SVG can be saved next to the image in a rather small size (or even embedded into the resulting PNG, so it can be restored the same way you can load back config from an output). |
Maybe worth noting there's a fork of the paint-with-words repository which uses transformer pipelines: paint-with-words-pipelines. I've created a minimal example on top of it which can be run on Windows in a venv. (see steps.txt included) |
I got this to work with 6 gigs vram. |
I update the paint with word extension at Paint with Word , combining ControlNet and Paint with Word (PwW). One can also use pure PwW by setting the weight of ControlNet to 0 |
@lwchen6309 please check your version, I have a conflict with Controlnet. My bug is here. |
Closing as the extension mentioned above has been available for quite some time. https://github.com/lwchen6309/paint-with-words-sd |
Is there an existing issue for this?
What would your feature do ?
Implement Paint with words
Proposed workflow
Additional information
No response
The text was updated successfully, but these errors were encountered: