[Single File] Add GGUF support #9964

DN6 · 2024-11-19T13:55:17Z

What does this PR do?

Adds support for loading GGUF checkpoints via from_single_file.

import torch

from diffusers import FluxPipeline, FluxTransformer2DModel, GGUFQuantizationConfig

ckpt_path = (
    "https://huggingface.co/city96/FLUX.1-dev-gguf/blob/main/flux1-dev-Q2_K.gguf"
)
transformer = FluxTransformer2DModel.from_single_file(
    ckpt_path,
    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
    torch_dtype=torch.bfloat16,
)
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    transformer=transformer,
    generator=torch.manual_seed(0),
    torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()
prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt_embeds=prompt_embeds, pooled_prompt_embeds=pooled_prompt_embeds
).images[0]
image.save("flux-gguf.png")

Notes:

API for loading GGUF is a bit overkill, but it's consistent with quantized loading in from_pretrained. GGUF files have enough metadata that we can automatically infer everything we need from the file itself. We don't really need a quantization config, but it becomes necessary as we expand to support to other quant loading methods (BnB, TorchAO etc)

TODOS:

Benchmark loading and inference speed.
Verify output quality
Add tests

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

DN6 · 2024-11-19T13:58:54Z

src/diffusers/quantizers/bitsandbytes/bnb_quantizer.py

@@ -204,7 +204,10 @@ def create_quantized_param(

        module._parameters[tensor_name] = new_value

-    def check_quantized_param_shape(self, param_name, current_param_shape, loaded_param_shape):
+    def check_quantized_param_shape(self, param_name, current_param, loaded_param):


GGUF needs to access the tensor quant type to run a shape check. So this needs to change from passing in shapes to passing in params directly.

Why not add this method to the gguf_quantizer.py file instead of modifying this? This would be a breaking change no?

I see you're already adding this to the GGUF quantizer class. So, maybe okay to not modify this?

DN6 · 2024-11-19T14:01:47Z

src/diffusers/quantizers/gguf/utils.py

+import torch.nn as nn
+
+
+def _replace_with_gguf_linear(model, compute_dtype):


GGUF files contain a mix of quantized linear and unquantized linear layers. It's not trivial to selectively replace layers. We can replace all of them and then check the parameter type when running forward instead.

src/diffusers/quantizers/gguf/utils.py

HuggingFaceDocBuilderDev · 2024-11-21T08:16:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul · 2024-11-21T09:21:07Z

src/diffusers/quantizers/gguf/utils.py

+                compute_dtype=compute_dtype,
+            )
+            model._modules[name].source_cls = type(module)
+            # Force requires grad to False to avoid unexpected errors


Suggested change

# Force requires grad to False to avoid unexpected errors

# Force requires_grad to False to avoid unexpected errors

DN6 added 14 commits October 21, 2024 11:37

update

b5eeaa4

update

71897b1

update

89ea1ee

update

f0bcd94

update

60d1385

update

22ed0b0

update

2e6d340

update

b5f927c

Merge branch 'main' into gguf-support

b9666c7

update

6dc5d22

update

428e44b

update

d7f09f2

update

1649936

update

28d3a64

DN6 commented Nov 19, 2024

View reviewed changes

bghira reviewed Nov 19, 2024

View reviewed changes

src/diffusers/quantizers/gguf/utils.py Show resolved Hide resolved

update

c34a451

update

84493db

sayakpaul reviewed Nov 21, 2024

View reviewed changes

update

50bd784

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Single File] Add GGUF support #9964

[Single File] Add GGUF support #9964

DN6 commented Nov 19, 2024 •

edited

Loading

DN6 Nov 19, 2024

sayakpaul Nov 21, 2024

sayakpaul Nov 21, 2024

DN6 Nov 19, 2024

DN6 Nov 21, 2024

HuggingFaceDocBuilderDev commented Nov 21, 2024

sayakpaul Nov 21, 2024

		import torch.nn as nn


		def _replace_with_gguf_linear(model, compute_dtype):

	# Force requires grad to False to avoid unexpected errors
	# Force requires_grad to False to avoid unexpected errors

[Single File] Add GGUF support #9964

Are you sure you want to change the base?

[Single File] Add GGUF support #9964

Conversation

DN6 commented Nov 19, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

DN6 Nov 19, 2024

Choose a reason for hiding this comment

sayakpaul Nov 21, 2024

Choose a reason for hiding this comment

sayakpaul Nov 21, 2024

Choose a reason for hiding this comment

DN6 Nov 19, 2024

Choose a reason for hiding this comment

DN6 Nov 21, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Nov 21, 2024

sayakpaul Nov 21, 2024

Choose a reason for hiding this comment

DN6 commented Nov 19, 2024 •

edited

Loading