Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TODO] 🔧 Could you please consider including our approach in this repository? Faster and Better #56

Open
HolmesShuan opened this issue Dec 11, 2024 · 7 comments

Comments

@HolmesShuan
Copy link

Hi, nice work! Your research has been a significant inspiration for our subsequent approach.

We propose a new solver for inversion-based editing with the FLUX-dev model, which demonstrates improved performance on the PIE-Bench, greater robustness to hyperparameters, and achieves approximately a 3x speedup. Our code is publicly available here: https://github.com/HolmesShuan/FireFlow-Fast-Inversion-of-Rectified-Flow-for-Image-Semantic-Editing.

The key modification compared to the RF-Solver lies in the sampling process: this URL.

If you’re interested in our approach, our paper is available here: https://arxiv.org/pdf/2412.07517v1.pdf.

Thank you again for your remarkable work, which has greatly inspired us. We sincerely appreciate your contributions to this field.

@logtd
Copy link
Owner

logtd commented Dec 11, 2024

Hey, I saw that this morning and a fast version of inversion sounds very useful!

I'm planning on adding this in very soon.

@logtd
Copy link
Owner

logtd commented Dec 12, 2024

Hi @HolmesShuan, I've added FireFlow, but want to make sure I made all the necessary changes.

if order == 'fireflow' and prev_pred is not None:
    pred = prev_pred
else:
    pred = model(x, s_in * sigma, **extra_args)
    ....

pred_mid = model(img_mid, s_in * sigma_mid, **extra_args)
if order == 'fireflow':
    prev_pred = pred_mid
    x = x + (sigma_next - sigma) * pred_mid
    return x

Does this cover all of it? I noticed you also had code for Q,K injection as well, but I didn't see any of your examples using these features.

If you have any tips on how to better control the image at low steps that would be appreciated too -- it seemed like when injection_steps=1 it can be too little, but injection_steps=2 is too much when using 8-10 steps.

I'll add a section to the README after you confirm

@HolmesShuan
Copy link
Author

HolmesShuan commented Dec 12, 2024

@logtd Thank you for your quick responses!

Q1. Does this cover all of it?
A1. LGTM. A minor concern: Will return x yield the result after only one iteration? Please forgive me if I’m mistaken.

Q2. I noticed you also had code for Q/K injection as well, but I didn’t see any of your examples using these features.
A2. I found that my approach occasionally fails to follow instructions in certain cases, such as changing the color or removing large objects/backgrounds from the source image. (Similar issues also exist in the RF-Solver.) To address this, we use the Q/K injection strategy to accomplish these edits, albeit at the cost of losing some structural integrity of the original image. The demo code for this is as follows:

python edit.py  --source_prompt [describe the content of your image or leave it as null] \
                --target_prompt [describe your editing requirements] \
                --guidance 2 \
                --source_img_dir [the path of your source image] \
                --num_steps 8 \
                --inject 1 \
                --start_layer_index 0 \
                --end_layer_index 37 \
                --name 'flux-dev' \
                --sampling_strategy 'fireflow' \
                --output_prefix 'fireflow' \
                --reuse_v 0 \ # 1 -> 0 to disable the default editing strategy
                --editing_strategy 'add_q' \ # 'replace_v' -> 'add_q' / 'add_k' / 'add_v'
                --offload \
                --output_dir [output path]

This code can also be found in the “Edit your own image” section of the README as a helpful tip.

Q3. How to better control the image at low steps?
A3. We did notice that in some cases, 8 steps may not be sufficient. In such scenarios, we use 10 steps with 1 injection or 15 steps with 2 injections for better control. However, to ensure a fair comparison, we used a fixed 8 steps to report results in our paper on the PIE-Bench.

Please feel free to reach out if you have further questions or if my answers didn’t fully address your concerns. Thanks again for your engagement!

@HolmesShuan
Copy link
Author

BTW:

...
--start_layer_index 0 \
--end_layer_index 37 \
...

These settings are not strictly necessary. They lead to slightly better preservation of the original image but slightly poorer instruction following.

...
--start_layer_index 20 \
--end_layer_index 37 \
...

These settings, as recommended in RF-Solver, contribute to slightly poorer preservation of the original image but slightly better instruction following.

It’s entirely up to you to decide whether to implement this adjustment based on your specific requirements.

@logtd
Copy link
Owner

logtd commented Dec 12, 2024

A1. LGTM. A minor concern: Will return x yield the result after only one iteration? Please forgive me if I’m mistaken.

This was just the inside of the for loop iteration, where x is reused each step. I accidentally copied the return x part in my comment.

To address this, we use the Q/K injection strategy to accomplish these edits

Ah, yeah, I've found that issue with RF-Edit. I'll add the Q/K injections at a later time since I'll need to make sure everything works in the system with them. I also have QK injections in my LTX nodes which i've noticed Q can help inject motion without visuals.

It’s entirely up to you to decide whether to implement this adjustment based on your specific requirements.

There's an option for people to change this when using the nodes. I've left it at the RF-Edit default for now though.

Thanks for your research and reaching out to me about it.

I'll be making an entry in the README when I get some time to make a workflow and examples.

@HolmesShuan
Copy link
Author

Cool! Thanks~

@HolmesShuan HolmesShuan changed the title Could you please consider including our approach in this repository? Faster and Better [TODO] Could you please consider including our approach in this repository? Faster and Better Dec 19, 2024
@HolmesShuan HolmesShuan changed the title [TODO] Could you please consider including our approach in this repository? Faster and Better [TODO] 🔧 Could you please consider including our approach in this repository? Faster and Better Dec 19, 2024
@HolmesShuan
Copy link
Author

@logtd Hello, I hope this message finds you well. I understand that you have many commitments, and I truly appreciate your time. It's been a couple of weeks since our last communication, and I wanted to inquire about the possibility of our work being featured in your project. While I recognize the advancements made by recent works like FlowEdit, I believe our approach offers notable advantages in runtime speed, which could be beneficial. It would be an honor to have our method considered for inclusion in the ComfyUI repository, potentially enhancing the experience for its users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants