[TODO] 🔧 Could you please consider including our approach in this repository? Faster and Better #56

HolmesShuan · 2024-12-11T11:04:06Z

Hi, nice work! Your research has been a significant inspiration for our subsequent approach.

We propose a new solver for inversion-based editing with the FLUX-dev model, which demonstrates improved performance on the PIE-Bench, greater robustness to hyperparameters, and achieves approximately a 3x speedup. Our code is publicly available here: https://github.com/HolmesShuan/FireFlow-Fast-Inversion-of-Rectified-Flow-for-Image-Semantic-Editing.

The key modification compared to the RF-Solver lies in the sampling process: this URL.

If you’re interested in our approach, our paper is available here: https://arxiv.org/pdf/2412.07517v1.pdf.

Thank you again for your remarkable work, which has greatly inspired us. We sincerely appreciate your contributions to this field.

logtd · 2024-12-11T15:46:46Z

Hey, I saw that this morning and a fast version of inversion sounds very useful!

I'm planning on adding this in very soon.

logtd · 2024-12-12T05:12:20Z

Hi @HolmesShuan, I've added FireFlow, but want to make sure I made all the necessary changes.

if order == 'fireflow' and prev_pred is not None:
    pred = prev_pred
else:
    pred = model(x, s_in * sigma, **extra_args)
    ....

pred_mid = model(img_mid, s_in * sigma_mid, **extra_args)
if order == 'fireflow':
    prev_pred = pred_mid
    x = x + (sigma_next - sigma) * pred_mid
    return x

Does this cover all of it? I noticed you also had code for Q,K injection as well, but I didn't see any of your examples using these features.

If you have any tips on how to better control the image at low steps that would be appreciated too -- it seemed like when injection_steps=1 it can be too little, but injection_steps=2 is too much when using 8-10 steps.

I'll add a section to the README after you confirm

HolmesShuan · 2024-12-12T05:47:42Z

@logtd Thank you for your quick responses!

Q1. Does this cover all of it?
A1. LGTM. A minor concern: Will return x yield the result after only one iteration? Please forgive me if I’m mistaken.

Q2. I noticed you also had code for Q/K injection as well, but I didn’t see any of your examples using these features.
A2. I found that my approach occasionally fails to follow instructions in certain cases, such as changing the color or removing large objects/backgrounds from the source image. (Similar issues also exist in the RF-Solver.) To address this, we use the Q/K injection strategy to accomplish these edits, albeit at the cost of losing some structural integrity of the original image. The demo code for this is as follows:

python edit.py  --source_prompt [describe the content of your image or leave it as null] \
                --target_prompt [describe your editing requirements] \
                --guidance 2 \
                --source_img_dir [the path of your source image] \
                --num_steps 8 \
                --inject 1 \
                --start_layer_index 0 \
                --end_layer_index 37 \
                --name 'flux-dev' \
                --sampling_strategy 'fireflow' \
                --output_prefix 'fireflow' \
                --reuse_v 0 \ # 1 -> 0 to disable the default editing strategy
                --editing_strategy 'add_q' \ # 'replace_v' -> 'add_q' / 'add_k' / 'add_v'
                --offload \
                --output_dir [output path]

This code can also be found in the “Edit your own image” section of the README as a helpful tip.

Q3. How to better control the image at low steps?
A3. We did notice that in some cases, 8 steps may not be sufficient. In such scenarios, we use 10 steps with 1 injection or 15 steps with 2 injections for better control. However, to ensure a fair comparison, we used a fixed 8 steps to report results in our paper on the PIE-Bench.

Please feel free to reach out if you have further questions or if my answers didn’t fully address your concerns. Thanks again for your engagement!

HolmesShuan · 2024-12-12T06:00:05Z

BTW:

...
--start_layer_index 0 \
--end_layer_index 37 \
...

These settings are not strictly necessary. They lead to slightly better preservation of the original image but slightly poorer instruction following.

...
--start_layer_index 20 \
--end_layer_index 37 \
...

These settings, as recommended in RF-Solver, contribute to slightly poorer preservation of the original image but slightly better instruction following.

It’s entirely up to you to decide whether to implement this adjustment based on your specific requirements.

logtd · 2024-12-12T16:20:39Z

A1. LGTM. A minor concern: Will return x yield the result after only one iteration? Please forgive me if I’m mistaken.

This was just the inside of the for loop iteration, where x is reused each step. I accidentally copied the return x part in my comment.

To address this, we use the Q/K injection strategy to accomplish these edits

Ah, yeah, I've found that issue with RF-Edit. I'll add the Q/K injections at a later time since I'll need to make sure everything works in the system with them. I also have QK injections in my LTX nodes which i've noticed Q can help inject motion without visuals.

It’s entirely up to you to decide whether to implement this adjustment based on your specific requirements.

There's an option for people to change this when using the nodes. I've left it at the RF-Edit default for now though.

Thanks for your research and reaching out to me about it.

I'll be making an entry in the README when I get some time to make a workflow and examples.

HolmesShuan · 2024-12-13T00:34:29Z

Cool! Thanks~

HolmesShuan · 2024-12-27T04:52:01Z

@logtd Hello, I hope this message finds you well. I understand that you have many commitments, and I truly appreciate your time. It's been a couple of weeks since our last communication, and I wanted to inquire about the possibility of our work being featured in your project. While I recognize the advancements made by recent works like FlowEdit, I believe our approach offers notable advantages in runtime speed, which could be beneficial. It would be an honor to have our method considered for inclusion in the ComfyUI repository, potentially enhancing the experience for its users.

HolmesShuan mentioned this issue Dec 13, 2024

ComfyUI wrapper please HolmesShuan/FireFlow-Fast-Inversion-of-Rectified-Flow-for-Image-Semantic-Editing#1

Open

HolmesShuan changed the title ~~Could you please consider including our approach in this repository? Faster and Better~~ [TODO] Could you please consider including our approach in this repository? Faster and Better Dec 19, 2024

HolmesShuan changed the title ~~[TODO] Could you please consider including our approach in this repository? Faster and Better~~ [TODO] 🔧 Could you please consider including our approach in this repository? Faster and Better Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TODO] 🔧 Could you please consider including our approach in this repository? Faster and Better #56

[TODO] 🔧 Could you please consider including our approach in this repository? Faster and Better #56

HolmesShuan commented Dec 11, 2024

logtd commented Dec 11, 2024

logtd commented Dec 12, 2024 •

edited

Loading

HolmesShuan commented Dec 12, 2024 •

edited

Loading

HolmesShuan commented Dec 12, 2024

logtd commented Dec 12, 2024 •

edited

Loading

HolmesShuan commented Dec 13, 2024

HolmesShuan commented Dec 27, 2024

[TODO] 🔧 Could you please consider including our approach in this repository? Faster and Better #56

[TODO] 🔧 Could you please consider including our approach in this repository? Faster and Better #56

Comments

HolmesShuan commented Dec 11, 2024

logtd commented Dec 11, 2024

logtd commented Dec 12, 2024 • edited Loading

HolmesShuan commented Dec 12, 2024 • edited Loading

HolmesShuan commented Dec 12, 2024

logtd commented Dec 12, 2024 • edited Loading

HolmesShuan commented Dec 13, 2024

HolmesShuan commented Dec 27, 2024

logtd commented Dec 12, 2024 •

edited

Loading

HolmesShuan commented Dec 12, 2024 •

edited

Loading

logtd commented Dec 12, 2024 •

edited

Loading