Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flux.1 cannot load standard transformer in nf4 #9996

Open
vladmandic opened this issue Nov 22, 2024 · 12 comments
Open

Flux.1 cannot load standard transformer in nf4 #9996

vladmandic opened this issue Nov 22, 2024 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@vladmandic
Copy link
Contributor

Describe the bug

loading different flux transformer models is fine except for nf4.
it works for 1% of fine-tunes provided on Huggingface, but it doesn't work for 99% standard fine-tunes available on CivitAI.

example of such model: https://civitai.com/models/118111?modelVersionId=1009051

note i'm using FluxTransformer2DModel directly as its easiest for reproduction plus majority of flux fine-tunes are provided as transformer-only, not full models. but where full model does exist, its exactly the same problem using FluxPipeline

Reproduction

import torch
import bitsandbytes as bnb
import diffusers

print(f'torch=={torch.__version__} diffusers=={diffusers.__version__} bnb=={bnb.__version__}')
kwargs = { 'low_cpu_mem_usage': True, 'torch_dtype': torch.bfloat16, 'cache_dir': '/mnt/models/huggingface' }
files = [
    'flux-c4pacitor_v2alpha-f1s-bf16.safetensors',
    'flux-iniverse_v2-f1d-fp8.safetensors',
    'flux-copax_timeless_xplus_mix2-nf4.safetensors',
]

for f in files:
    print(f)
    try:
        transformer = diffusers.FluxTransformer2DModel.from_single_file(f, **kwargs)
        print(transformer.__class__)
    except Exception as e:
        print(e)
    transformer = None
    torch.cuda.empty_cache()

Logs

in `diffusers/loaders/single_file_utils.py:convert_flux_transformer_checkpoint_to_diffusers`


q, k, v, mlp = torch.split(checkpoint.pop(f"single_blocks.{i}.linear1.weight"), split_size, dim=0)


> RuntimeError: split_with_sizes expects split_sizes to sum exactly to 33030144 (input tensor's size at dimension 0), but got split_sizes=[3072, 3072, 3072, 12288]

System Info

torch==2.5.1+cu124 diffusers==0.32.0.dev0 bnb==0.44.1

Who can help?

@yiyixuxu @sayakpaul @DN6 @asomoza

@vladmandic vladmandic added the bug Something isn't working label Nov 22, 2024
@sayakpaul
Copy link
Member

sayakpaul commented Nov 22, 2024

I don't think we support loading single-file NF4 checkpoints yet. Loading pre-quantized NF4 (or more generally bnb) checkpoint is only supported via from_pretrained() as of now.

Some reasoning as to why that is the case (from #9165 (comment)):

For the diffusion model as in keys prefixed with model.diffusion_model, we suggest following the saving and loading approach in the OP because we cannot define a clear mechanism to load the quantization stats for the attention modules from those keys and associated tensors.

But maybe there's a way now.

@DN6 would you be able to check this? If not, I will find time.

@vladmandic
Copy link
Contributor Author

I don't think we support loading single-file NF4 checkpoints yet

i'm specifically loading transformer-only. although its the same issue with single-file checkpoint (although they are much less frequent).
if this needs to be converted from issue to feature-request, fine by me, but its a high priority item non-the-less since right now diffusers cannot work with majority of models uploaded on civitai - and that is pretty much a standard nowadays.

@sayakpaul
Copy link
Member

i'm specifically loading transformer-only. although its the same issue with single-file checkpoint (although they are much less frequent).

Clearing some confusion. I think what you mean is the following. When trying to load a standard transformer checkpoint (which is the original BFL format) using from_single_file() -- it almost always succeeds. However, when trying to load pre-quantized (NF4) checkpoint (having the same BFL format but with NF4 specific keys added) using from_single_file() -- it fails. Correct?

Issue / feature request is fine and high-prio is fine too.

@vladmandic
Copy link
Contributor Author

yes, that is correct. it works for transformers in fp32/fp16/fp8 safetensors, but fails for nf4.
(using my gguf code, it also works for .gguf in different quants which leaves nf4 as only one that fails - and unfortunately, that is the highly desired one)

@sayakpaul
Copy link
Member

it works for transformers in fp32/fp16/fp8 safetensors, but fails for nf4.

Thanks for confirming! Just to give you a reason as to why it fails is because NF4 state dicts have special quantization keys and they also compress some original dimensionality which is why you faced the error you reported. See, for example: https://huggingface.co/hf-internal-testing/flux.1-dev-nf4-pkg/tree/main/transformer?show_file_info=transformer%2Fdiffusion_pytorch_model.safetensors (quant_map for example).

and unfortunately, that is the highly desired one

Yes, not denying it. This will be supported :)

@sayakpaul
Copy link
Member

(using my gguf code, it also works for .gguf in different quants which leaves nf4 as only one that fails - and unfortunately, that is the highly desired one)

@vladmandic do you have a reference for this? Would be quite helpful!

@vladmandic
Copy link
Contributor Author

@vladmandic do you have a reference for this? Would be quite helpful!

GGUF? i shared it in #9487 (comment)

@bghira
Copy link
Contributor

bghira commented Nov 22, 2024

the problem with civitai models is that they are not standard. everything is rather ad-hoc for state dicts for most models released. it's unfortunate that it became the more prevalent/common means of distribution right now, but i'm working directly with civitai to improve this situation down the line and allow model creators to provide configuration details for a given model and even directly support Diffusers to this effect.

@vladmandic
Copy link
Contributor Author

the problem with civitai models is that they are not standard. everything is rather ad-hoc for state dicts for most models released.

i totally agree with that statement - its a wild west!
one reason why i'm pushing OMI to create standards before eventually releasing a model.

@bghira
Copy link
Contributor

bghira commented Nov 22, 2024

yes, let OMI create a new standard to solve the unity issue among standards 🤣
image

sorry for the noise, i'll see myself out

@vladmandic
Copy link
Contributor Author

i've used that xkcd many times myself!
for omi, i actually dont care what the standard is as long as its one. right now differences in implementations for same model in different formats are whats a killer.

@DN6
Copy link
Collaborator

DN6 commented Nov 25, 2024

The issue is caused by the conversion step. If you have a prequantized BnB checkpoint, the keys are all flattened. Any tensor manipulation we run when converting the checkpoint won't work. We'll probably have to update our conversion functions to account for quant shapes. It's on my list after GGUF support #9964 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants