Skip to content

Commit

Permalink
Merge pull request #195 from huggingface/main
Browse files Browse the repository at this point in the history
Merge changes
  • Loading branch information
Skquark authored Dec 20, 2024
2 parents 9d5a5e0 + b64ca6c commit d18e6ae
Show file tree
Hide file tree
Showing 117 changed files with 10,246 additions and 498 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/nightly_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -357,6 +357,8 @@ jobs:
config:
- backend: "bitsandbytes"
test_location: "bnb"
- backend: "gguf"
test_location: "gguf"
runs-on:
group: aws-g6e-xlarge-plus
container:
Expand Down
8 changes: 8 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,10 @@
title: Getting Started
- local: quantization/bitsandbytes
title: bitsandbytes
- local: quantization/gguf
title: gguf
- local: quantization/torchao
title: torchao
title: Quantization Methods
- sections:
- local: optimization/fp16
Expand Down Expand Up @@ -234,6 +238,8 @@
title: Textual Inversion
- local: api/loaders/unet
title: UNet
- local: api/loaders/transformer_sd3
title: SD3Transformer2D
- local: api/loaders/peft
title: PEFT
title: Loaders
Expand Down Expand Up @@ -396,6 +402,8 @@
title: DiT
- local: api/pipelines/flux
title: Flux
- local: api/pipelines/control_flux_inpaint
title: FluxControlInpaint
- local: api/pipelines/hunyuandit
title: Hunyuan-DiT
- local: api/pipelines/hunyuan_video
Expand Down
117 changes: 106 additions & 11 deletions docs/source/en/api/attnprocessor.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,40 +15,135 @@ specific language governing permissions and limitations under the License.
An attention processor is a class for applying different types of attention mechanisms.

## AttnProcessor

[[autodoc]] models.attention_processor.AttnProcessor

## AttnProcessor2_0
[[autodoc]] models.attention_processor.AttnProcessor2_0

## AttnAddedKVProcessor
[[autodoc]] models.attention_processor.AttnAddedKVProcessor

## AttnAddedKVProcessor2_0
[[autodoc]] models.attention_processor.AttnAddedKVProcessor2_0

[[autodoc]] models.attention_processor.AttnProcessorNPU

[[autodoc]] models.attention_processor.FusedAttnProcessor2_0

## Allegro

[[autodoc]] models.attention_processor.AllegroAttnProcessor2_0

## AuraFlow

[[autodoc]] models.attention_processor.AuraFlowAttnProcessor2_0

[[autodoc]] models.attention_processor.FusedAuraFlowAttnProcessor2_0

## CogVideoX

[[autodoc]] models.attention_processor.CogVideoXAttnProcessor2_0

[[autodoc]] models.attention_processor.FusedCogVideoXAttnProcessor2_0

## CrossFrameAttnProcessor

[[autodoc]] pipelines.text_to_video_synthesis.pipeline_text_to_video_zero.CrossFrameAttnProcessor

## CustomDiffusionAttnProcessor
## Custom Diffusion

[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor

## CustomDiffusionAttnProcessor2_0
[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor2_0

## CustomDiffusionXFormersAttnProcessor
[[autodoc]] models.attention_processor.CustomDiffusionXFormersAttnProcessor

## FusedAttnProcessor2_0
[[autodoc]] models.attention_processor.FusedAttnProcessor2_0
## Flux

[[autodoc]] models.attention_processor.FluxAttnProcessor2_0

[[autodoc]] models.attention_processor.FusedFluxAttnProcessor2_0

[[autodoc]] models.attention_processor.FluxSingleAttnProcessor2_0

## Hunyuan

[[autodoc]] models.attention_processor.HunyuanAttnProcessor2_0

[[autodoc]] models.attention_processor.FusedHunyuanAttnProcessor2_0

[[autodoc]] models.attention_processor.PAGHunyuanAttnProcessor2_0

[[autodoc]] models.attention_processor.PAGCFGHunyuanAttnProcessor2_0

## IdentitySelfAttnProcessor2_0

[[autodoc]] models.attention_processor.PAGIdentitySelfAttnProcessor2_0

[[autodoc]] models.attention_processor.PAGCFGIdentitySelfAttnProcessor2_0

## IP-Adapter

[[autodoc]] models.attention_processor.IPAdapterAttnProcessor

[[autodoc]] models.attention_processor.IPAdapterAttnProcessor2_0

[[autodoc]] models.attention_processor.SD3IPAdapterJointAttnProcessor2_0

## JointAttnProcessor2_0

[[autodoc]] models.attention_processor.JointAttnProcessor2_0

[[autodoc]] models.attention_processor.PAGJointAttnProcessor2_0

[[autodoc]] models.attention_processor.PAGCFGJointAttnProcessor2_0

[[autodoc]] models.attention_processor.FusedJointAttnProcessor2_0

## LoRA

[[autodoc]] models.attention_processor.LoRAAttnProcessor

[[autodoc]] models.attention_processor.LoRAAttnProcessor2_0

[[autodoc]] models.attention_processor.LoRAAttnAddedKVProcessor

[[autodoc]] models.attention_processor.LoRAXFormersAttnProcessor

## Lumina-T2X

[[autodoc]] models.attention_processor.LuminaAttnProcessor2_0

## Mochi

[[autodoc]] models.attention_processor.MochiAttnProcessor2_0

[[autodoc]] models.attention_processor.MochiVaeAttnProcessor2_0

## Sana

[[autodoc]] models.attention_processor.SanaLinearAttnProcessor2_0

[[autodoc]] models.attention_processor.SanaMultiscaleAttnProcessor2_0

[[autodoc]] models.attention_processor.PAGCFGSanaLinearAttnProcessor2_0

[[autodoc]] models.attention_processor.PAGIdentitySanaLinearAttnProcessor2_0

## Stable Audio

[[autodoc]] models.attention_processor.StableAudioAttnProcessor2_0

## SlicedAttnProcessor

[[autodoc]] models.attention_processor.SlicedAttnProcessor

## SlicedAttnAddedKVProcessor
[[autodoc]] models.attention_processor.SlicedAttnAddedKVProcessor

## XFormersAttnProcessor

[[autodoc]] models.attention_processor.XFormersAttnProcessor

## AttnProcessorNPU
[[autodoc]] models.attention_processor.AttnProcessorNPU
[[autodoc]] models.attention_processor.XFormersAttnAddedKVProcessor

## XLAFlashAttnProcessor2_0

[[autodoc]] models.attention_processor.XLAFlashAttnProcessor2_0
6 changes: 6 additions & 0 deletions docs/source/en/api/loaders/ip_adapter.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,12 @@ Learn how to load an IP-Adapter checkpoint and image in the IP-Adapter [loading]

[[autodoc]] loaders.ip_adapter.IPAdapterMixin

## SD3IPAdapterMixin

[[autodoc]] loaders.ip_adapter.SD3IPAdapterMixin
- all
- is_ip_adapter_active

## IPAdapterMaskProcessor

[[autodoc]] image_processor.IPAdapterMaskProcessor
15 changes: 15 additions & 0 deletions docs/source/en/api/loaders/lora.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
- [`StableDiffusionLoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model.
- [`StableDiffusionXLLoraLoaderMixin`] is a [Stable Diffusion (SDXL)](../../api/pipelines/stable_diffusion/stable_diffusion_xl) version of the [`StableDiffusionLoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model.
- [`SD3LoraLoaderMixin`] provides similar functions for [Stable Diffusion 3](https://huggingface.co/blog/sd3).
- [`FluxLoraLoaderMixin`] provides similar functions for [Flux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux).
- [`CogVideoXLoraLoaderMixin`] provides similar functions for [CogVideoX](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox).
- [`Mochi1LoraLoaderMixin`] provides similar functions for [Mochi](https://huggingface.co/docs/diffusers/main/en/api/pipelines/mochi).
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.

Expand All @@ -38,6 +41,18 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse

[[autodoc]] loaders.lora_pipeline.SD3LoraLoaderMixin

## FluxLoraLoaderMixin

[[autodoc]] loaders.lora_pipeline.FluxLoraLoaderMixin

## CogVideoXLoraLoaderMixin

[[autodoc]] loaders.lora_pipeline.CogVideoXLoraLoaderMixin

## Mochi1LoraLoaderMixin

[[autodoc]] loaders.lora_pipeline.Mochi1LoraLoaderMixin

## AmusedLoraLoaderMixin

[[autodoc]] loaders.lora_pipeline.AmusedLoraLoaderMixin
Expand Down
29 changes: 29 additions & 0 deletions docs/source/en/api/loaders/transformer_sd3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# SD3Transformer2D

This class is useful when *only* loading weights into a [`SD3Transformer2DModel`]. If you need to load weights into the text encoder or a text encoder and SD3Transformer2DModel, check [`SD3LoraLoaderMixin`](lora#diffusers.loaders.SD3LoraLoaderMixin) class instead.

The [`SD3Transformer2DLoadersMixin`] class currently only loads IP-Adapter weights, but will be used in the future to save weights and load LoRAs.

<Tip>

To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.

</Tip>

## SD3Transformer2DLoadersMixin

[[autodoc]] loaders.transformer_sd3.SD3Transformer2DLoadersMixin
- all
- _load_ip_adapter_weights
2 changes: 2 additions & 0 deletions docs/source/en/api/models/autoencoder_dc.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ The following DCAE models are released and supported in Diffusers.
| [`mit-han-lab/dc-ae-f128c512-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f128c512-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0)
| [`mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f128c512-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0)

This model was contributed by [lawrence-cj](https://github.com/lawrence-cj).

Load a model in Diffusers format with [`~ModelMixin.from_pretrained`].

```python
Expand Down
89 changes: 89 additions & 0 deletions docs/source/en/api/pipelines/control_flux_inpaint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
<!--Copyright 2024 The HuggingFace Team, The Black Forest Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# FluxControlInpaint

FluxControlInpaintPipeline is an implementation of Inpainting for Flux.1 Depth/Canny models. It is a pipeline that allows you to inpaint images using the Flux.1 Depth/Canny models. The pipeline takes an image and a mask as input and returns the inpainted image.

FLUX.1 Depth and Canny [dev] is a 12 billion parameter rectified flow transformer capable of generating an image based on a text description while following the structure of a given input image. **This is not a ControlNet model**.

| Control type | Developer | Link |
| -------- | ---------- | ---- |
| Depth | [Black Forest Labs](https://huggingface.co/black-forest-labs) | [Link](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev) |
| Canny | [Black Forest Labs](https://huggingface.co/black-forest-labs) | [Link](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev) |


<Tip>

Flux can be quite expensive to run on consumer hardware devices. However, you can perform a suite of optimizations to run it faster and in a more memory-friendly manner. Check out [this section](https://huggingface.co/blog/sd3#memory-optimizations-for-sd3) for more details. Additionally, Flux can benefit from quantization for memory efficiency with a trade-off in inference latency. Refer to [this blog post](https://huggingface.co/blog/quanto-diffusers) to learn more. For an exhaustive list of resources, check out [this gist](https://gist.github.com/sayakpaul/b664605caf0aa3bf8585ab109dd5ac9c).

</Tip>

```python
import torch
from diffusers import FluxControlInpaintPipeline
from diffusers.models.transformers import FluxTransformer2DModel
from transformers import T5EncoderModel
from diffusers.utils import load_image, make_image_grid
from image_gen_aux import DepthPreprocessor # https://github.com/huggingface/image_gen_aux
from PIL import Image
import numpy as np

pipe = FluxControlInpaintPipeline.from_pretrained(
"black-forest-labs/FLUX.1-Depth-dev",
torch_dtype=torch.bfloat16,
)
# use following lines if you have GPU constraints
# ---------------------------------------------------------------
transformer = FluxTransformer2DModel.from_pretrained(
"sayakpaul/FLUX.1-Depth-dev-nf4", subfolder="transformer", torch_dtype=torch.bfloat16
)
text_encoder_2 = T5EncoderModel.from_pretrained(
"sayakpaul/FLUX.1-Depth-dev-nf4", subfolder="text_encoder_2", torch_dtype=torch.bfloat16
)
pipe.transformer = transformer
pipe.text_encoder_2 = text_encoder_2
pipe.enable_model_cpu_offload()
# ---------------------------------------------------------------
pipe.to("cuda")

prompt = "a blue robot singing opera with human-like expressions"
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

head_mask = np.zeros_like(image)
head_mask[65:580,300:642] = 255
mask_image = Image.fromarray(head_mask)

processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(image)[0].convert("RGB")

output = pipe(
prompt=prompt,
image=image,
control_image=control_image,
mask_image=mask_image,
num_inference_steps=30,
strength=0.9,
guidance_scale=10.0,
generator=torch.Generator().manual_seed(42),
).images[0]
make_image_grid([image, control_image, mask_image, output.resize(image.size)], rows=1, cols=4).save("output.png")
```

## FluxControlInpaintPipeline
[[autodoc]] FluxControlInpaintPipeline
- all
- __call__


## FluxPipelineOutput
[[autodoc]] pipelines.flux.pipeline_output.FluxPipelineOutput
Loading

0 comments on commit d18e6ae

Please sign in to comment.