Flux pipeline (huggingface#9043)

add flux! Signed-off-by: Adrien <[email protected]> Co-authored-by: Adrien <[email protected]> Co-authored-by: Anatoly Belikov <[email protected]> Co-authored-by: Dhruv Nair <[email protected]> Co-authored-by: yiyixuxu <[email protected]>
Skquark · Aug 1, 2024 · 27637a5 · 27637a5
1 parent 2ea22e1
commit 27637a5
Show file tree

Hide file tree

Showing 21 changed files with 2,270 additions and 30 deletions.
diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
@@ -253,6 +253,8 @@
       title: HunyuanDiT2DModel
     - local: api/models/aura_flow_transformer2d
       title: AuraFlowTransformer2DModel
+    - local: api/models/flux_transformer
+      title: FluxTransformer2DModel
     - local: api/models/latte_transformer3d
       title: LatteTransformer3DModel
     - local: api/models/lumina_nextdit2d
@@ -320,6 +322,8 @@
       title: DiffEdit
     - local: api/pipelines/dit
       title: DiT
+    - local: api/pipelines/flux
+      title: Flux
     - local: api/pipelines/hunyuandit
       title: Hunyuan-DiT
     - local: api/pipelines/i2vgenxl

diff --git a/docs/source/en/api/models/flux_transformer.md b/docs/source/en/api/models/flux_transformer.md
@@ -0,0 +1,19 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# FluxTransformer2DModel
+
+A Transformer model for image-like data from [Flux](https://blackforestlabs.ai/announcing-black-forest-labs/).
+
+## FluxTransformer2DModel
+
+[[autodoc]] FluxTransformer2DModel
diff --git a/docs/source/en/api/pipelines/flux.md b/docs/source/en/api/pipelines/flux.md
@@ -0,0 +1,84 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Flux
+
+Flux is a series of text-to-image generation models based on diffusion transformers. To know more about Flux, check out the original [blog post](https://blackforestlabs.ai/announcing-black-forest-labs/) by the creators of Flux, Black Forest Labs. 
+
+Original model checkpoints for Flux can be found [here](https://huggingface.co/black-forest-labs). Original inference code can be found [here](https://github.com/black-forest-labs/flux).
+
+<Tip>
+
+Flux can be quite expensive to run on consumer hardware devices. However, you can perform a suite of optimizations to run it faster and in a more memory-friendly manner. Check out [this section](https://huggingface.co/blog/sd3#memory-optimizations-for-sd3) for more details. Additionally, Flux can benefit from quantization for memory efficiency with a trade-off in inference latency. Refer to [this blog post](https://huggingface.co/blog/quanto-diffusers) to learn more. 
+
+</Tip>
+
+Flux comes in two variants:
+
+* Timestep-distilled (`black-forest-labs/FLUX.1-schnell`)
+* Guidance-distilled (`black-forest-labs/FLUX.1-dev`)
+
+Both checkpoints have slightly difference usage which we detail below. 
+
+### Timestep-distilled
+
+* `max_sequence_length` cannot be more than 256. 
+* `guidance_scale` needs to be 0.
+* As this is a timestep-distilled model, it benefits from fewer sampling steps.
+
+```python
+import torch
+from diffusers import  FluxPipeline
+
+pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
+pipe.enable_model_cpu_offload()
+
+prompt = "A cat holding a sign that says hello world"
+out = pipe(
+    prompt=prompt, 
+    guidance_scale=0., 
+    height=768, 
+    width=1360, 
+    num_inference_steps=4, 
+    max_sequence_length=256,
+).images[0]
+out.save("image.png")
+```
+
+### Guidance-distilled
+
+* The guidance-distilled variant takes about 50 sampling steps for good-quality generation.
+* It doesn't have any limitations around the `max_sequence_length`. 
+
+```python
+import torch
+from diffusers import  FluxPipeline
+
+pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
+pipe.enable_model_cpu_offload()
+
+prompt = "a tiny astronaut hatching from an egg on the moon"
+out = pipe(
+    prompt=prompt, 
+    guidance_scale=3.5, 
+    height=768, 
+    width=1360, 
+    num_inference_steps=50,
+).images[0]
+out.save("image.png")
+```
+
+## FluxPipeline
+
+[[autodoc]] FluxPipeline
+	- all
+	- __call__