Can we get more schedulers for flow based models such as SD3, SD3.5, and flux #9924

linjiapro · 2024-11-14T00:07:56Z

It seems advanced schedulers such as DDIM, and the dpm++ 2m does work with flow based model such as SD3, SD3.5, and flux.
However, I only see 2 flow based schedulers in diffusers codebase:

FlowMatchEulerDiscreteScheduler, and'
FlowMatchHeunDiscreteScheduler

I tried to use DPMSolverMultistepScheduler, but it does not generate correct images with flow based models. Help?

sayakpaul · 2024-11-14T13:37:58Z

Cc: @asomoza @hlky @yiyixuxu

ukaprch · 2024-11-15T15:57:23Z

I'm glad this shortcoming has been brought to light. With that in mind, I have been testing a new diffuser scheduler which is adapted from the current diffuser / schedulers library (scheduling_dpmsolver_multistep.py) along with this: https://github.com/crowsonkb/k-diffusion/blob/master/k_diffusion/sampling.py
the net result is a new scheduler containing 7 different samplers ("dpmsolver2", "dpmsolver2A", "dpmsolver++2M", "dpmsolver++2S", "dpmsolver++sde", "dpmsolver++2Msde", "dpmsolver++3Msde).
This new scheduler would work for both the existing FLUX and SD3 pipelines in the diffusers library although I will say that SD3 suffers (temperamental) somewhat when using karras, exponential and lambdas sigmas at times.
I have done a lot of empirical testing with it and it appears to function as intended. You all are most welcome to try it.
I have attached the scheduler and the parameters I use. Some of the new parameters will become apparent to you as you familiarize yourself with the scheduler's use. As things stand now, SD3 benefits somewhat from using the SDXL approach in creating the sigmas vs the new FLUX approach. See the scheduler for details on that. I found that SD3 can also run with dynamic shifting as FLUX does, but again results can vary.

Note: I have configured my diffusers library with the updated scheduler and importing normally for my testing.

Create the pipeline as you normally would then append the scheduler with the required parameters:
Note: you can also modify parameters such as s_noise ,use_noise_sampler, etc.
scheduling_flow_match_dpmsolver_multistep.txt

SD3:
pipe.scheduler = FlowMatchDPMSolverMultistepScheduler.from_config(pipe.scheduler.config,solver_order=2,sigma_schedule='karras',algorithm_type="dpmsolver2")

FLUX:
pipe.scheduler = FlowMatchDPMSolverMultistepScheduler.from_config(pipe.scheduler.config,solver_order=2,sigma_schedule='karras',algorithm_type="dpmsolver2")

linjiapro · 2024-11-15T18:59:27Z

@ukaprch

From the import clauses txt file, it seems the classes are already under diffusers repo for the testing runs. It seems to be suitable to include the file to the official diffusers repo?

ukaprch · 2024-11-15T21:16:24Z

Some of you can update your diffusers library to incorporate the scheduler I proposed (which is a first start) for independent testing (as I have). The scheduler as written can be incorporated into the diffuser library w/o change and will work with the current FLUX and SD3 pipelines.

linjiapro · 2024-11-15T21:23:31Z

@sayakpaul @ukaprch

I have a pending PR which has not been merged:
#9758

Once above merges, I can work on sending another PR related to schedulers. In the meanwhile, anyone else are welcomed to raise a pull request.

vladmandic · 2024-11-17T13:52:23Z

also related #9607

vladmandic · 2024-11-17T17:18:10Z

@ukaprch why not create a pr to include this in diffusers?
i've integrated your code in sdnext, dpm flowmatch is working fine except any kind of karras/exponential results in full blur.

anyhow, here are some sample grids (including original euler and heun flowmatch for reference)

sd35:

flux.1

auraflow: also tried with this model since its also flow-match based and it works

ukaprch · 2024-11-18T15:14:33Z

Thanks for testing and your feedback. Please note that I had no such problems using the FLUX pipeline with the new scheduler using algorithms (karras, exponential, lambdas) see attached. I am using the quantized version (QINT8) FWIW for transformers and T5.

        image = pipe(
            prompt='a cat holding a sign that says hello world',
            width = 1024,
            height = 1024,
            guidance_scale=3.5,
            generator=generator, (seed = 114747598)
            max_sequence_length=512,
            num_inference_steps=30).images[0]

I ran the above pipeline using the basic FLUX dev prompt from Replicate and generated the following locally on my PC:

vladmandic · 2024-11-18T15:52:35Z

bit more testing, karras and exponential are working fine for flux, but not for sd35

flux:

sd35

actual params used:

10:50:38-967529 DEBUG    Sampler: sampler="DPM2++ 2M FlowMatch" class="FlowMatchDPMSolverMultistepScheduler config={'num_train_timesteps': 1000, 'shift': 1, 'use_dynamic_shifting': False, 'solver_order': 2, 'sigma_schedule': 'karras', 'use_SD35_sigmas': True, 'algorithm_type': 'dpmsolver++2M', 'use_noise_sampler': True}

ukaprch · 2024-11-18T17:32:47Z

OK, I noticed you are using a shift: 1. This should be shift: 3 which is the std for SD3. I'll look into some more.

After running my test the only noticeable thing I see is a more pronounced 'bokeh' effect which I wouldn't necessarily characterize as blur per se. I do think the sigmas / timesteps for karras, exponential and lambdas for SD 3.5 could possibly use some further refinement.

image = pipe(
prompt='photorealistic image of a robot walking the streets of a city meeting passersby, 8K, dramatic lighting, textured skin, film grain, RAW photo, highly detailed',
negative_prompt=negative_prompt,
width = 1024,
height = 1024,
guidance_scale=3.5,
generator=generator, (seed = 580198207)
max_sequence_length=512,
num_inference_steps=30).images[0]

scheduler config:

◢ | config | OrderedDict([('num_train_timesteps', 1000), ('solver_order', 2), ('thresholding', False), ('dynamic_thresholding_ratio', 0.995), ('sample_max_value', 1.0), ('algorithm_type', 'dpmsolver++2Msde'), ('solver_type', 'midpoint'), ('sigma_schedule', None), ('shift', 3.0), ('midpoint_ratio', 0.5), ('s_noise', 1), ('use_noise_sampler', True), ('use_SD35_sigmas', True), ('use_dynamic_shifting', False), ('base_shift', 0.5), ('max_shift', 1.15), ('base_image_seq_len', 256), ('max_image_seq_len', 4096), ('_use_default_values', ['dynamic_thresholding_ratio', 'base_image_seq_len', 'midpoint_ratio', 'solver_type', 'thresholding', 'base_shift', 'max_image_seq_len', 'sample_max_value'])]) | FrozenDict

vladmandic · 2024-11-18T19:24:57Z

ahhh, found it. for testing purposes i was running with steps=15 only.
and that's fine for beta, but definitely not fine for other sigma methods.

btw, good catch on shift value, thats a user error on my side, had it left like that from flux, but it doesn't impact this.

ukaprch · 2024-11-18T22:39:46Z

I found with FLUX that you need generally 25 or more steps to get good images. Remember, we're not using the 3 step method as previously with SDXL. Your images look good BTW. It really comes into its own b/t 35 - 50 steps. BTW if you increase the shift factor from 3 to 3.5 you get a bit more contrast in the image which may or may not be to everyone's liking.
So what are you thinking? Are we almost good to proceed with this as a scheduler?

vladmandic · 2024-11-18T23:16:33Z

I found with FLUX that you need generally 25 or more steps to get good images.

actually, the worst-case scenario here is sd35-medium ;)
flux is behaving better in this regard.

re: shift - i've noticed. i have it as user-configurable item

So what are you thinking? Are we almost good to proceed with this as a scheduler?

definitely!

from a quick look at the code, 99% is clean.
del sigmas1 looks unsafe as it may not be defined in all code-paths. and in general, does it help at all to have all those del statements in set_timestep()?

i wish if use_SD35_sigmas could be auto-determined, but i don't see a clean way.
but it does need a better name, definitely not something with mixed-capitalization :)

ukaprch · 2024-11-20T13:50:35Z

vladmandic: As noted, this scheduler is not designed for flow match derived sampling.

I've attached a new FlowMatch version of the original: scheduling_dpmsolver_multistep.py (scheduling_flow_match_dpmsolver_multistep_orig.py)
and updated scheduling_flow_match_dpmsolver_multistep.py per some of your suggestions.
The powers that be TPTB if they decide to move forward on these can obviously rename / make changes, etc.

simags1 is gone. A holdover from my previous testing.
use_SD35_sigmas is now use_beta_sigmas and you'll see why when you look at the code.
both new scheduler's have a consistent variable interface so you'll be able to use both w/o much effort.
sigma_schedule (which replaces all those use_????_sigmas=True in the original version) include the following:
[None, "karras", "exponential", "lambdas", "betas"].
Betas play an important role in determining sigmas and image rendering. Because of this, beta_start and beta_end become important variables and should be configurable. Currently I am using the default: beta_start = 0.0001; beta_end = 0.02 and custom: beta_start = 0.00085; beta_end=0.012. Values in between would also work well. Again, this is all subjective.
When using sigma_schedulers "karras", "exponential" and "lambdas" you will almost certainly have better outcomes with your images when using more inference steps (i.e. 40+ vs 25).
It's unclear with the current versions of FLUX DEV and SD 3.5 Large that you can run most images with these schedulers and get good results in under 20 steps. The current (Open) models were not designed for this. What is clear is that the dpm++ class of solvers are better at rendering realistic images (in my testing) than the current FlowMatchEulerDiscrete.py scheduler.

scheduling_flow_match_dpmsolver_multistep.txt
scheduling_flow_match_dpmsolver_multistep_orig.txt

Please do some more testing on both and state any concerns, etc. I think we are very close on this and the community would definitely benefit from both.

Example images using same generation data. Left is new proposed version of scheduling_flow_match_dpmsolver_multistep.py and right is current scheduling_flow_match_euler_discrete.py both used 25 steps:

linjiapro · 2024-11-22T03:45:03Z

@ukaprch

Let me know if it is OK that I send out a PR includes the script you offered, if you can do it, I will not send out the PR.

ukaprch · 2024-11-22T12:10:57Z

Please do go ahead and send out the PR. I believe we're ready to go. As I said previously, the folks in charge can go over them and make any necessary adjustments they feel are necessary, but I do think we are ready to proceed. Thanks again.

vladmandic · 2024-11-22T12:52:07Z

@ukaprch i agree with all of your comments, but i feel like something went wrong in the last version?
comparing code from last week to current one:
(old code does not have betas so its same as default, but otherwise the grid is correct)

generated using sd35-medium with 50 steps and example config i'm using:

{'num_train_timesteps': 1000, 'beta_start': 0.0001, 'beta_end': 0.02, 'beta_schedule': 'linear', 'shift': 3, 'use_dynamic_shifting': False, 'solver_order': 2, 'sigma_schedule': 'lambdas', 'use_beta_sigmas': True, 'algorithm_type': 'dpmsolver2', 'use_noise_sampler': True}

vladmandic · 2024-11-22T13:02:38Z

also, with flux and sigma_method=betas, i'm getting index-out-of-range at

sigma_next = self.sigmas[self.step_index + 1]

ukaprch · 2024-11-22T19:50:05Z

also, with flux and sigma_method=betas, i'm getting index-out-of-range at

sigma_next = self.sigmas[self.step_index + 1]

I was unable to replicate your error "Flux using "beta" sigma_schedule" using this config with the scheduling_flow_match_dpmsolver_multistep.py
this condition should never occur b/c we append a zero sigma at the end "in all cases" save for use of "sigma_min" in the original scheduler. my config:

◢ | config | OrderedDict([('num_train_timesteps', 1000), ('beta_start', 0.0001), ('beta_end', 0.02), ('beta_schedule', 'linear'), ('trained_betas', None), ('solver_order', 2), ('algorithm_type', 'dpmsolver2'), ('solver_type', 'midpoint'), ('sigma_schedule', 'betas'), ('shift', 3.0), ('midpoint_ratio', 0.5), ('s_noise', 1), ('use_noise_sampler', True), ('use_beta_sigmas', True), ('use_dynamic_shifting', True), ('base_shift', 0.5), ('max_shift', 1.15), ('base_image_seq_len', 256), ('max_image_seq_len', 4096), ('_use_default_values', ['trained_betas', 'solver_type', 'solver_order'])]) | FrozenDict

Also just so we're on the same page:

Maybe some confusion on this but sigma_schedule = "betas" is not the same thing as
use_beta_sigmas=True.

self.config.sigma_schedule == "betas"
use_beta_sigmas True

  sigmas	tensor([1.0000, 0.9979, 0.9932, 0.9865, 0.9779, 0.9676, 0.9554, 0.9413, 0.9253,
  0.9072, 0.8869, 0.8642, 0.8389, 0.8109, 0.7799, 0.7456, 0.7079, 0.6665,
  0.6211, 0.5718, 0.5184, 0.4611, 0.4003, 0.3368, 0.2718, 0.2074, 0.1462,
  0.0923, 0.0509, 0.0309, 0.0000], device='cuda:0', dtype=torch.float64)	Tensor

self.config.sigma_schedule == None
use_beta_sigmas True

  sigmas	tensor([1.0000, 0.9889, 0.9773, 0.9651, 0.9523, 0.9388, 0.9246, 0.9096, 0.8937,
  0.8769, 0.8590, 0.8401, 0.8199, 0.7983, 0.7753, 0.7506, 0.7241, 0.6955,
  0.6646, 0.6311, 0.5947, 0.5550, 0.5115, 0.4635, 0.4106, 0.3516, 0.2857,
  0.2115, 0.1273, 0.0309, 0.0000], device='cuda:0', dtype=torch.float64)	Tensor

What did I miss?

vladmandic · 2024-11-22T20:42:41Z

Maybe some confusion on this but sigma_schedule = "betas" is not the same thing as use_beta_sigmas=True.

i know - use_beta_sigmas is to make sd35 happy. sigma_schedule=betas is new sigma method that you didn't have before.
havent gone through the math, but looking at sd35 outputs, there is definite degradation for karras/exponential/lambdas.
in flux i dont see those problems as much.

ukaprch · 2024-11-22T23:15:01Z

Yes. The degradation problem is most probably due to the way FlowMatch approaches the problem. The sigmas / timesteps are vastly compressed under FlowMatch which negates their usability. More steps alleviates the problem at a cost. This cannot be stressed enough for new / existing users that use these type schedulers. I'm very happy with most of the results I've achieved using them. As for SD3.5 Large, it more resembles SDXL than it does FLUX. Not sure why they chose (3) text encoders. Without the beta sigmas it would be far worse. I've also played around with using dynamic shifting with SD3.5 large. It requires you to add the required functionality to the pipeline to make it work. Something to think about I guess.

linjiapro · 2024-11-22T23:20:33Z

@ukaprch,

I believe you have studied this extensively.

Can we add some best usage pattern to the code, they can be comments on top of those schedulers, and they can be:

"""
if you use this for SD35, here is the recommended way:

scheduler = .... (such as with what beta setting, etc)

"""

linjiapro · 2024-11-22T23:23:02Z

After the most recommended usage pattern is added, I plan to send out a PR.

vladmandic · 2024-11-23T04:44:45Z

@ukaprch i agree with your thoughts on sd35, especially on their choice of both clip-l and clip-g for no reason.
but what i was referring to as "degradation" is that with sd35 your latest code performs worse than your earlier code - that is especially visible with karras.
old vs new

ukaprch · 2024-11-23T14:24:46Z

Can you provide me all the parameters you used for the above images so I can see?

vladmandic · 2024-11-23T15:48:35Z

sd35-medium, steps=50, nothing out of ordinary with the rest of generate params.
sampler params were as follows:

{'num_train_timesteps': 1000, 'beta_start': 0.0001, 'beta_end': 0.02, 'beta_schedule': 'linear', 'shift': 3, 'use_dynamic_shifting': False, 'solver_order': 2, 'sigma_schedule': 'karras', 'use_beta_sigmas': True, 'algorithm_type': 'dpmsolver2', 'use_noise_sampler': True}

ukaprch · 2024-11-23T18:49:36Z

I needed to setup my environment for sd35 Medium which I never tested. Having done that, using your same parameters I ran both sd35 Medium & Large for 30 and 40 steps respectively and encountered no problems:

Did something change in your environment when you updated? Did (perhaps) any code get misaligned in upgrading and not running the same?

As an aside, those using sd 35 should be aware that these models will not run well using FLUX aspect ratios and sizes. Case in point, I ran sd 35 Large using aspect ratio 2:3 with size: 1152 x 1728. As you can see in the image below both the top and bottom of the image contain artifacts reflecting problems in generating this size image. I also ran into the same problem if I ran an image with aspect ratio: 1:1 size: 1408 X 1408 which FLUX can easily handle.

sayakpaul added the scheduler label Nov 14, 2024

vladmandic mentioned this issue Nov 20, 2024

DEISMultistepScheduler not working on FLUX #9971

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we get more schedulers for flow based models such as SD3, SD3.5, and flux #9924

Can we get more schedulers for flow based models such as SD3, SD3.5, and flux #9924

linjiapro commented Nov 14, 2024

sayakpaul commented Nov 14, 2024

ukaprch commented Nov 15, 2024 •

edited

Loading

linjiapro commented Nov 15, 2024

ukaprch commented Nov 15, 2024

linjiapro commented Nov 15, 2024

vladmandic commented Nov 17, 2024 •

edited

Loading

vladmandic commented Nov 17, 2024 •

edited

Loading

ukaprch commented Nov 18, 2024 •

edited

Loading

vladmandic commented Nov 18, 2024

ukaprch commented Nov 18, 2024 •

edited

Loading

vladmandic commented Nov 18, 2024

ukaprch commented Nov 18, 2024 •

edited

Loading

vladmandic commented Nov 18, 2024

ukaprch commented Nov 20, 2024 •

edited

Loading

linjiapro commented Nov 22, 2024

ukaprch commented Nov 22, 2024 •

edited

Loading

vladmandic commented Nov 22, 2024 •

edited

Loading

vladmandic commented Nov 22, 2024

ukaprch commented Nov 22, 2024 •

edited

Loading

vladmandic commented Nov 22, 2024 •

edited

Loading

ukaprch commented Nov 22, 2024

linjiapro commented Nov 22, 2024 •

edited

Loading

linjiapro commented Nov 22, 2024

vladmandic commented Nov 23, 2024

ukaprch commented Nov 23, 2024 •

edited

Loading

vladmandic commented Nov 23, 2024 •

edited

Loading

ukaprch commented Nov 23, 2024

Can we get more schedulers for flow based models such as SD3, SD3.5, and flux #9924

Can we get more schedulers for flow based models such as SD3, SD3.5, and flux #9924

Comments

linjiapro commented Nov 14, 2024

sayakpaul commented Nov 14, 2024

ukaprch commented Nov 15, 2024 • edited Loading

linjiapro commented Nov 15, 2024

ukaprch commented Nov 15, 2024

linjiapro commented Nov 15, 2024

vladmandic commented Nov 17, 2024 • edited Loading

vladmandic commented Nov 17, 2024 • edited Loading

ukaprch commented Nov 18, 2024 • edited Loading

vladmandic commented Nov 18, 2024

ukaprch commented Nov 18, 2024 • edited Loading

vladmandic commented Nov 18, 2024

ukaprch commented Nov 18, 2024 • edited Loading

vladmandic commented Nov 18, 2024

ukaprch commented Nov 20, 2024 • edited Loading

linjiapro commented Nov 22, 2024

ukaprch commented Nov 22, 2024 • edited Loading

vladmandic commented Nov 22, 2024 • edited Loading

vladmandic commented Nov 22, 2024

ukaprch commented Nov 22, 2024 • edited Loading

vladmandic commented Nov 22, 2024 • edited Loading

ukaprch commented Nov 22, 2024

linjiapro commented Nov 22, 2024 • edited Loading

linjiapro commented Nov 22, 2024

vladmandic commented Nov 23, 2024

ukaprch commented Nov 23, 2024 • edited Loading

vladmandic commented Nov 23, 2024 • edited Loading

ukaprch commented Nov 23, 2024

ukaprch commented Nov 15, 2024 •

edited

Loading

vladmandic commented Nov 17, 2024 •

edited

Loading

vladmandic commented Nov 17, 2024 •

edited

Loading

ukaprch commented Nov 18, 2024 •

edited

Loading

ukaprch commented Nov 18, 2024 •

edited

Loading

ukaprch commented Nov 18, 2024 •

edited

Loading

ukaprch commented Nov 20, 2024 •

edited

Loading

ukaprch commented Nov 22, 2024 •

edited

Loading

vladmandic commented Nov 22, 2024 •

edited

Loading

ukaprch commented Nov 22, 2024 •

edited

Loading

vladmandic commented Nov 22, 2024 •

edited

Loading

linjiapro commented Nov 22, 2024 •

edited

Loading

ukaprch commented Nov 23, 2024 •

edited

Loading

vladmandic commented Nov 23, 2024 •

edited

Loading