Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we get more schedulers for flow based models such as SD3, SD3.5, and flux #9924

Open
linjiapro opened this issue Nov 14, 2024 · 27 comments

Comments

@linjiapro
Copy link
Contributor

It seems advanced schedulers such as DDIM, and the dpm++ 2m does work with flow based model such as SD3, SD3.5, and flux.
However, I only see 2 flow based schedulers in diffusers codebase:

FlowMatchEulerDiscreteScheduler, and'
FlowMatchHeunDiscreteScheduler

I tried to use DPMSolverMultistepScheduler, but it does not generate correct images with flow based models. Help?

@sayakpaul
Copy link
Member

Cc: @asomoza @hlky @yiyixuxu

@ukaprch
Copy link

ukaprch commented Nov 15, 2024

I'm glad this shortcoming has been brought to light. With that in mind, I have been testing a new diffuser scheduler which is adapted from the current diffuser / schedulers library (scheduling_dpmsolver_multistep.py) along with this: https://github.com/crowsonkb/k-diffusion/blob/master/k_diffusion/sampling.py
the net result is a new scheduler containing 7 different samplers ("dpmsolver2", "dpmsolver2A", "dpmsolver++2M", "dpmsolver++2S", "dpmsolver++sde", "dpmsolver++2Msde", "dpmsolver++3Msde).
This new scheduler would work for both the existing FLUX and SD3 pipelines in the diffusers library although I will say that SD3 suffers (temperamental) somewhat when using karras, exponential and lambdas sigmas at times.
I have done a lot of empirical testing with it and it appears to function as intended. You all are most welcome to try it.
I have attached the scheduler and the parameters I use. Some of the new parameters will become apparent to you as you familiarize yourself with the scheduler's use. As things stand now, SD3 benefits somewhat from using the SDXL approach in creating the sigmas vs the new FLUX approach. See the scheduler for details on that. I found that SD3 can also run with dynamic shifting as FLUX does, but again results can vary.

Note: I have configured my diffusers library with the updated scheduler and importing normally for my testing.

Create the pipeline as you normally would then append the scheduler with the required parameters:
Note: you can also modify parameters such as s_noise ,use_noise_sampler, etc.
scheduling_flow_match_dpmsolver_multistep.txt

SD3:
pipe.scheduler = FlowMatchDPMSolverMultistepScheduler.from_config(pipe.scheduler.config,solver_order=2,sigma_schedule='karras',algorithm_type="dpmsolver2")

FLUX:
pipe.scheduler = FlowMatchDPMSolverMultistepScheduler.from_config(pipe.scheduler.config,solver_order=2,sigma_schedule='karras',algorithm_type="dpmsolver2")

@linjiapro
Copy link
Contributor Author

@ukaprch

From the import clauses txt file, it seems the classes are already under diffusers repo for the testing runs. It seems to be suitable to include the file to the official diffusers repo?

@ukaprch
Copy link

ukaprch commented Nov 15, 2024

Some of you can update your diffusers library to incorporate the scheduler I proposed (which is a first start) for independent testing (as I have). The scheduler as written can be incorporated into the diffuser library w/o change and will work with the current FLUX and SD3 pipelines.

@linjiapro
Copy link
Contributor Author

@sayakpaul @ukaprch

I have a pending PR which has not been merged:
#9758

Once above merges, I can work on sending another PR related to schedulers. In the meanwhile, anyone else are welcomed to raise a pull request.

@vladmandic
Copy link
Contributor

vladmandic commented Nov 17, 2024

also related #9607

@vladmandic
Copy link
Contributor

vladmandic commented Nov 17, 2024

@ukaprch why not create a pr to include this in diffusers?
i've integrated your code in sdnext, dpm flowmatch is working fine except any kind of karras/exponential results in full blur.

anyhow, here are some sample grids (including original euler and heun flowmatch for reference)

sd35:
dpm-flowmatch-sd35

flux.1
dpm-flowmatch-flux

auraflow: also tried with this model since its also flow-match based and it works
dpm-flowmatch-aura

@ukaprch
Copy link

ukaprch commented Nov 18, 2024

Thanks for testing and your feedback. Please note that I had no such problems using the FLUX pipeline with the new scheduler using algorithms (karras, exponential, lambdas) see attached. I am using the quantized version (QINT8) FWIW for transformers and T5.

        image = pipe(
            prompt='a cat holding a sign that says hello world',
            width = 1024,
            height = 1024,
            guidance_scale=3.5,
            generator=generator, (seed = 114747598)
            max_sequence_length=512,
            num_inference_steps=30).images[0]

I ran the above pipeline using the basic FLUX dev prompt from Replicate and generated the following locally on my PC:
image

@vladmandic
Copy link
Contributor

bit more testing, karras and exponential are working fine for flux, but not for sd35

flux:
image

sd35
image (1)

actual params used:

10:50:38-967529 DEBUG    Sampler: sampler="DPM2++ 2M FlowMatch" class="FlowMatchDPMSolverMultistepScheduler config={'num_train_timesteps': 1000, 'shift': 1, 'use_dynamic_shifting': False, 'solver_order': 2, 'sigma_schedule': 'karras', 'use_SD35_sigmas': True, 'algorithm_type': 'dpmsolver++2M', 'use_noise_sampler': True}

@ukaprch
Copy link

ukaprch commented Nov 18, 2024

OK, I noticed you are using a shift: 1. This should be shift: 3 which is the std for SD3. I'll look into some more.

After running my test the only noticeable thing I see is a more pronounced 'bokeh' effect which I wouldn't necessarily characterize as blur per se. I do think the sigmas / timesteps for karras, exponential and lambdas for SD 3.5 could possibly use some further refinement.

image = pipe(
prompt='photorealistic image of a robot walking the streets of a city meeting passersby, 8K, dramatic lighting, textured skin, film grain, RAW photo, highly detailed',
negative_prompt=negative_prompt,
width = 1024,
height = 1024,
guidance_scale=3.5,
generator=generator, (seed = 580198207)
max_sequence_length=512,
num_inference_steps=30).images[0]

scheduler config:

◢ | config | OrderedDict([('num_train_timesteps', 1000), ('solver_order', 2), ('thresholding', False), ('dynamic_thresholding_ratio', 0.995), ('sample_max_value', 1.0), ('algorithm_type', 'dpmsolver++2Msde'), ('solver_type', 'midpoint'), ('sigma_schedule', None), ('shift', 3.0), ('midpoint_ratio', 0.5), ('s_noise', 1), ('use_noise_sampler', True), ('use_SD35_sigmas', True), ('use_dynamic_shifting', False), ('base_shift', 0.5), ('max_shift', 1.15), ('base_image_seq_len', 256), ('max_image_seq_len', 4096), ('_use_default_values', ['dynamic_thresholding_ratio', 'base_image_seq_len', 'midpoint_ratio', 'solver_type', 'thresholding', 'base_shift', 'max_image_seq_len', 'sample_max_value'])]) | FrozenDict

  | algorithm_type | 'dpmsolver++2Msde' | str
  | base_image_seq_len | 256 | int
  | base_shift | 0.5 | float
  | dynamic_thresholding_ratio | 0.995 | float
  | max_image_seq_len | 4096 | int
  | max_shift | 1.15 | float
  | midpoint_ratio | 0.5 | float
  | num_train_timesteps | 1000 | int
  | s_noise | 1 | int
  | sample_max_value | 1.0 | float
  | shift | 3.0 | float
  | sigma_schedule | None | NoneType
  | solver_order | 2 | int
  | solver_type | 'midpoint' | str
  | thresholding | False | bool
  | use_SD35_sigmas | True | bool
  | use_dynamic_shifting | False | bool
  | use_noise_sampler | True | bool

spx7239

@vladmandic
Copy link
Contributor

ahhh, found it. for testing purposes i was running with steps=15 only.
and that's fine for beta, but definitely not fine for other sigma methods.

image

btw, good catch on shift value, thats a user error on my side, had it left like that from flux, but it doesn't impact this.

@ukaprch
Copy link

ukaprch commented Nov 18, 2024

I found with FLUX that you need generally 25 or more steps to get good images. Remember, we're not using the 3 step method as previously with SDXL. Your images look good BTW. It really comes into its own b/t 35 - 50 steps. BTW if you increase the shift factor from 3 to 3.5 you get a bit more contrast in the image which may or may not be to everyone's liking.
So what are you thinking? Are we almost good to proceed with this as a scheduler?

@vladmandic
Copy link
Contributor

I found with FLUX that you need generally 25 or more steps to get good images.

actually, the worst-case scenario here is sd35-medium ;)
flux is behaving better in this regard.

re: shift - i've noticed. i have it as user-configurable item

So what are you thinking? Are we almost good to proceed with this as a scheduler?

definitely!

from a quick look at the code, 99% is clean.
del sigmas1 looks unsafe as it may not be defined in all code-paths. and in general, does it help at all to have all those del statements in set_timestep()?

i wish if use_SD35_sigmas could be auto-determined, but i don't see a clean way.
but it does need a better name, definitely not something with mixed-capitalization :)

@ukaprch
Copy link

ukaprch commented Nov 20, 2024

vladmandic: As noted, this scheduler is not designed for flow match derived sampling.

I've attached a new FlowMatch version of the original: scheduling_dpmsolver_multistep.py (scheduling_flow_match_dpmsolver_multistep_orig.py)
and updated scheduling_flow_match_dpmsolver_multistep.py per some of your suggestions.
The powers that be TPTB if they decide to move forward on these can obviously rename / make changes, etc.

  1. simags1 is gone. A holdover from my previous testing.
  2. use_SD35_sigmas is now use_beta_sigmas and you'll see why when you look at the code.
  3. both new scheduler's have a consistent variable interface so you'll be able to use both w/o much effort.
  4. sigma_schedule (which replaces all those use_????_sigmas=True in the original version) include the following:
    [None, "karras", "exponential", "lambdas", "betas"].
  5. Betas play an important role in determining sigmas and image rendering. Because of this, beta_start and beta_end become important variables and should be configurable. Currently I am using the default: beta_start = 0.0001; beta_end = 0.02 and custom: beta_start = 0.00085; beta_end=0.012. Values in between would also work well. Again, this is all subjective.
  6. When using sigma_schedulers "karras", "exponential" and "lambdas" you will almost certainly have better outcomes with your images when using more inference steps (i.e. 40+ vs 25).
  7. It's unclear with the current versions of FLUX DEV and SD 3.5 Large that you can run most images with these schedulers and get good results in under 20 steps. The current (Open) models were not designed for this. What is clear is that the dpm++ class of solvers are better at rendering realistic images (in my testing) than the current FlowMatchEulerDiscrete.py scheduler.

scheduling_flow_match_dpmsolver_multistep.txt
scheduling_flow_match_dpmsolver_multistep_orig.txt

Please do some more testing on both and state any concerns, etc. I think we are very close on this and the community would definitely benefit from both.

Example images using same generation data. Left is new proposed version of scheduling_flow_match_dpmsolver_multistep.py and right is current scheduling_flow_match_euler_discrete.py both used 25 steps:

spx201F

@linjiapro
Copy link
Contributor Author

@ukaprch

Let me know if it is OK that I send out a PR includes the script you offered, if you can do it, I will not send out the PR.

@ukaprch
Copy link

ukaprch commented Nov 22, 2024

Please do go ahead and send out the PR. I believe we're ready to go. As I said previously, the folks in charge can go over them and make any necessary adjustments they feel are necessary, but I do think we are ready to proceed. Thanks again.

@vladmandic
Copy link
Contributor

vladmandic commented Nov 22, 2024

@ukaprch i agree with all of your comments, but i feel like something went wrong in the last version?
comparing code from last week to current one:
(old code does not have betas so its same as default, but otherwise the grid is correct)

old
new

generated using sd35-medium with 50 steps and example config i'm using:

{'num_train_timesteps': 1000, 'beta_start': 0.0001, 'beta_end': 0.02, 'beta_schedule': 'linear', 'shift': 3, 'use_dynamic_shifting': False, 'solver_order': 2, 'sigma_schedule': 'lambdas', 'use_beta_sigmas': True, 'algorithm_type': 'dpmsolver2', 'use_noise_sampler': True}

@vladmandic
Copy link
Contributor

also, with flux and sigma_method=betas, i'm getting index-out-of-range at

sigma_next = self.sigmas[self.step_index + 1]

@ukaprch
Copy link

ukaprch commented Nov 22, 2024

also, with flux and sigma_method=betas, i'm getting index-out-of-range at

sigma_next = self.sigmas[self.step_index + 1]

I was unable to replicate your error "Flux using "beta" sigma_schedule" using this config with the scheduling_flow_match_dpmsolver_multistep.py
this condition should never occur b/c we append a zero sigma at the end "in all cases" save for use of "sigma_min" in the original scheduler. my config:

◢ | config | OrderedDict([('num_train_timesteps', 1000), ('beta_start', 0.0001), ('beta_end', 0.02), ('beta_schedule', 'linear'), ('trained_betas', None), ('solver_order', 2), ('algorithm_type', 'dpmsolver2'), ('solver_type', 'midpoint'), ('sigma_schedule', 'betas'), ('shift', 3.0), ('midpoint_ratio', 0.5), ('s_noise', 1), ('use_noise_sampler', True), ('use_beta_sigmas', True), ('use_dynamic_shifting', True), ('base_shift', 0.5), ('max_shift', 1.15), ('base_image_seq_len', 256), ('max_image_seq_len', 4096), ('_use_default_values', ['trained_betas', 'solver_type', 'solver_order'])]) | FrozenDict

Also just so we're on the same page:

Maybe some confusion on this but sigma_schedule = "betas" is not the same thing as
use_beta_sigmas=True.

self.config.sigma_schedule == "betas"
use_beta_sigmas True

  •   sigmas	tensor([1.0000, 0.9979, 0.9932, 0.9865, 0.9779, 0.9676, 0.9554, 0.9413, 0.9253,
      0.9072, 0.8869, 0.8642, 0.8389, 0.8109, 0.7799, 0.7456, 0.7079, 0.6665,
      0.6211, 0.5718, 0.5184, 0.4611, 0.4003, 0.3368, 0.2718, 0.2074, 0.1462,
      0.0923, 0.0509, 0.0309, 0.0000], device='cuda:0', dtype=torch.float64)	Tensor
    

self.config.sigma_schedule == None
use_beta_sigmas True

  •   sigmas	tensor([1.0000, 0.9889, 0.9773, 0.9651, 0.9523, 0.9388, 0.9246, 0.9096, 0.8937,
      0.8769, 0.8590, 0.8401, 0.8199, 0.7983, 0.7753, 0.7506, 0.7241, 0.6955,
      0.6646, 0.6311, 0.5947, 0.5550, 0.5115, 0.4635, 0.4106, 0.3516, 0.2857,
      0.2115, 0.1273, 0.0309, 0.0000], device='cuda:0', dtype=torch.float64)	Tensor
    

What did I miss?

@vladmandic
Copy link
Contributor

vladmandic commented Nov 22, 2024

Maybe some confusion on this but sigma_schedule = "betas" is not the same thing as use_beta_sigmas=True.

i know - use_beta_sigmas is to make sd35 happy. sigma_schedule=betas is new sigma method that you didn't have before.
havent gone through the math, but looking at sd35 outputs, there is definite degradation for karras/exponential/lambdas.
in flux i dont see those problems as much.

@ukaprch
Copy link

ukaprch commented Nov 22, 2024

Yes. The degradation problem is most probably due to the way FlowMatch approaches the problem. The sigmas / timesteps are vastly compressed under FlowMatch which negates their usability. More steps alleviates the problem at a cost. This cannot be stressed enough for new / existing users that use these type schedulers. I'm very happy with most of the results I've achieved using them. As for SD3.5 Large, it more resembles SDXL than it does FLUX. Not sure why they chose (3) text encoders. Without the beta sigmas it would be far worse. I've also played around with using dynamic shifting with SD3.5 large. It requires you to add the required functionality to the pipeline to make it work. Something to think about I guess.

@linjiapro
Copy link
Contributor Author

linjiapro commented Nov 22, 2024

@ukaprch,

I believe you have studied this extensively.

Can we add some best usage pattern to the code, they can be comments on top of those schedulers, and they can be:

"""
if you use this for SD35, here is the recommended way:

scheduler = .... (such as with what beta setting, etc)

"""

@linjiapro
Copy link
Contributor Author

After the most recommended usage pattern is added, I plan to send out a PR.

@vladmandic
Copy link
Contributor

@ukaprch i agree with your thoughts on sd35, especially on their choice of both clip-l and clip-g for no reason.
but what i was referring to as "degradation" is that with sd35 your latest code performs worse than your earlier code - that is especially visible with karras.
old vs new
image
image

@ukaprch
Copy link

ukaprch commented Nov 23, 2024

Can you provide me all the parameters you used for the above images so I can see?

@vladmandic
Copy link
Contributor

vladmandic commented Nov 23, 2024

sd35-medium, steps=50, nothing out of ordinary with the rest of generate params.
sampler params were as follows:

{'num_train_timesteps': 1000, 'beta_start': 0.0001, 'beta_end': 0.02, 'beta_schedule': 'linear', 'shift': 3, 'use_dynamic_shifting': False, 'solver_order': 2, 'sigma_schedule': 'karras', 'use_beta_sigmas': True, 'algorithm_type': 'dpmsolver2', 'use_noise_sampler': True}

@ukaprch
Copy link

ukaprch commented Nov 23, 2024

I needed to setup my environment for sd35 Medium which I never tested. Having done that, using your same parameters I ran both sd35 Medium & Large for 30 and 40 steps respectively and encountered no problems:

SD35 MEDIUM
SD35 LARGE

Did something change in your environment when you updated? Did (perhaps) any code get misaligned in upgrading and not running the same?

As an aside, those using sd 35 should be aware that these models will not run well using FLUX aspect ratios and sizes. Case in point, I ran sd 35 Large using aspect ratio 2:3 with size: 1152 x 1728. As you can see in the image below both the top and bottom of the image contain artifacts reflecting problems in generating this size image. I also ran into the same problem if I ran an image with aspect ratio: 1:1 size: 1408 X 1408 which FLUX can easily handle.

SD35 2-3 1152 x 1728 aspect ratio problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants