Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge changes #129

Merged
merged 23 commits into from
Nov 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
0eeee61
Adds an advanced version of the SD-XL DreamBooth LoRA training script…
linoytsaban Nov 22, 2023
5ffa603
[bug fix] fix small bug in readme template of sdxl lora training scri…
linoytsaban Nov 23, 2023
3003ff4
[bug fix] fix small bug in readme template of sdxl lora training scri…
linoytsaban Nov 23, 2023
e5f232f
[Docs] add: 8bit inference with pixart alpha (#5814)
sayakpaul Nov 24, 2023
b978334
[@cene555][Kandinsky 3.0] Add Kandinsky 3.0 (#5913)
patrickvonplaten Nov 24, 2023
2a7f43a
correct num inference steps
patrickvonplaten Nov 24, 2023
6d2e19f
[Examples] Allow downloading variant model files (#5531)
patrickvonplaten Nov 27, 2023
7d6f30e
[Fix: pixart-alpha] random 512px resolution bug (#5842)
lawrence-cj Nov 27, 2023
3f7c351
[Core] add support for gradient checkpointing in transformer_2d (#5943)
sayakpaul Nov 27, 2023
9c357bd
Deprecate KarrasVeScheduler and ScoreSdeVpScheduler (#5269)
a-r-r-o-w Nov 27, 2023
67d0707
Add Custom Timesteps Support to LCMScheduler and Supported Pipelines …
dg845 Nov 27, 2023
c7bfb8b
set the model to train state before accelerator prepare (#5099)
sywangyi Nov 27, 2023
c079cae
Avoid computing min() that is expensive when do_normalize is False in…
ivanprado Nov 27, 2023
07eac4d
Fix LCM Stable Diffusion distillation bug related to parsing unet_tim…
dg845 Nov 27, 2023
d3cda80
add LoRA weights load and fuse support for IPEX pipeline (#5920)
linlifan Nov 27, 2023
d72a24b
Replace multiple variables with one variable. (#5715)
hi-sushanta Nov 27, 2023
20f0cbc
fix: error on device for `lpw_stable_diffusion_xl` pipeline if `pipe.…
VicGrygorchyk Nov 27, 2023
e550163
[Vae] Make sure all vae's work with latent diffusion models (#5880)
patrickvonplaten Nov 27, 2023
ebf581e
[Tests] Make sure that we don't run tests multiple times (#5949)
patrickvonplaten Nov 27, 2023
14a0d21
[Community Pipeline] Diffusion Posterior Sampling for General Noisy I…
tongdaxu Nov 27, 2023
b135b6e
[From_pretrained] Fix warning (#5948)
patrickvonplaten Nov 27, 2023
d9075be
[load_textual_inversion]: allow multiple tokens (#5837)
yiyixuxu Nov 27, 2023
50a749e
[docs] Fix space (#5898)
stevhliu Nov 27, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .github/workflows/pr_test_fetcher.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Fast tests for PRs
name: Fast tests for PRs - Test Fetcher

on:
pull_request:
Expand All @@ -14,6 +14,10 @@ env:
MKL_NUM_THREADS: 4
PYTEST_TIMEOUT: 60

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
setup_pr_tests:
name: Setup PR Tests
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/push_tests_fast.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ on:
branches:
- main

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

env:
DIFFUSERS_IS_CI: yes
HF_HOME: /mnt/cache
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/push_tests_mps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ env:
PYTEST_TIMEOUT: 600
RUN_SLOW: no

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
run_fast_tests_apple_m1:
name: Fast PyTorch MPS tests on MacOS
Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,8 @@
title: Kandinsky 2.1
- local: api/pipelines/kandinsky_v22
title: Kandinsky 2.2
- local: api/pipelines/kandinsky3
title: Kandinsky 3
- local: api/pipelines/latent_consistency_models
title: Latent Consistency Models
- local: api/pipelines/latent_diffusion
Expand Down
24 changes: 24 additions & 0 deletions docs/source/en/api/pipelines/kandinsky3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Kandinsky 3

TODO

## Kandinsky3Pipeline

[[autodoc]] Kandinsky3Pipeline
- all
- __call__

## Kandinsky3Img2ImgPipeline

[[autodoc]] Kandinsky3Img2ImgPipeline
- all
- __call__
106 changes: 106 additions & 0 deletions docs/source/en/api/pipelines/pixart.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,112 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)

</Tip>

## Inference with under 8GB GPU VRAM

Run the [`PixArtAlphaPipeline`] with under 8GB GPU VRAM by loading the text encoder in 8-bit precision. Let's walk through a full-fledged example.

First, install the [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) library:

```bash
pip install -U bitsandbytes
```

Then load the text encoder in 8-bit:

```python
from transformers import T5EncoderModel
from diffusers import PixArtAlphaPipeline
import torch

text_encoder = T5EncoderModel.from_pretrained(
"PixArt-alpha/PixArt-XL-2-1024-MS",
subfolder="text_encoder",
load_in_8bit=True,
device_map="auto",

)
pipe = PixArtAlphaPipeline.from_pretrained(
"PixArt-alpha/PixArt-XL-2-1024-MS",
text_encoder=text_encoder,
transformer=None,
device_map="auto"
)
```

Now, use the `pipe` to encode a prompt:

```python
with torch.no_grad():
prompt = "cute cat"
prompt_embeds, prompt_attention_mask, negative_embeds, negative_prompt_attention_mask = pipe.encode_prompt(prompt)
```

Since text embeddings have been computed, remove the `text_encoder` and `pipe` from the memory, and free up som GPU VRAM:

```python
import gc

def flush():
gc.collect()
torch.cuda.empty_cache()

del text_encoder
del pipe
flush()
```

Then compute the latents with the prompt embeddings as inputs:

```python
pipe = PixArtAlphaPipeline.from_pretrained(
"PixArt-alpha/PixArt-XL-2-1024-MS",
text_encoder=None,
torch_dtype=torch.float16,
).to("cuda")

latents = pipe(
negative_prompt=None,
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_embeds,
prompt_attention_mask=prompt_attention_mask,
negative_prompt_attention_mask=negative_prompt_attention_mask,
num_images_per_prompt=1,
output_type="latent",
).images

del pipe.transformer
flush()
```

<Tip>

Notice that while initializing `pipe`, you're setting `text_encoder` to `None` so that it's not loaded.

</Tip>

Once the latents are computed, pass it off to the VAE to decode into a real image:

```python
with torch.no_grad():
image = pipe.vae.decode(latents / pipe.vae.config.scaling_factor, return_dict=False)[0]
image = pipe.image_processor.postprocess(image, output_type="pil")[0]
image.save("cat.png")
```

By deleting components you aren't using and flushing the GPU VRAM, you should be able to run [`PixArtAlphaPipeline`] with under 8GB GPU VRAM.

![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/pixart/8bits_cat.png)

If you want a report of your memory-usage, run this [script](https://gist.github.com/sayakpaul/3ae0f847001d342af27018a96f467e4e).

<Tip warning={true}>

Text embeddings computed in 8-bit can impact the quality of the generated images because of the information loss in the representation space caused by the reduced precision. It's recommended to compare the outputs with and without 8-bit.

</Tip>

While loading the `text_encoder`, you set `load_in_8bit` to `True`. You could also specify `load_in_4bit` to bring your memory requirements down even further to under 7GB.

## PixArtAlphaPipeline

[[autodoc]] PixArtAlphaPipeline
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/api/schedulers/score_sde_vp.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@ The abstract from the paper is:
</Tip>

## ScoreSdeVpScheduler
[[autodoc]] schedulers.scheduling_sde_vp.ScoreSdeVpScheduler
[[autodoc]] schedulers.deprecated.scheduling_sde_vp.ScoreSdeVpScheduler
2 changes: 1 addition & 1 deletion docs/source/en/api/schedulers/stochastic_karras_ve.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@ specific language governing permissions and limitations under the License.
[[autodoc]] KarrasVeScheduler

## KarrasVeOutput
[[autodoc]] schedulers.scheduling_karras_ve.KarrasVeOutput
[[autodoc]] schedulers.deprecated.scheduling_karras_ve.KarrasVeOutput
49 changes: 18 additions & 31 deletions docs/source/en/using-diffusers/unconditional_image_generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,54 +14,41 @@ specific language governing permissions and limitations under the License.

[[open-in-colab]]

Unconditional image generation is a relatively straightforward task. The model only generates images - without any additional context like text or an image - resembling the training data it was trained on.
Unconditional image generation generates images that look like a random sample from the training data the model was trained on because the denoising process is not guided by any additional context like text or image.

The [`DiffusionPipeline`] is the easiest way to use a pre-trained diffusion system for inference.
To get started, use the [`DiffusionPipeline`] to load the [anton-l/ddpm-butterflies-128](https://huggingface.co/anton-l/ddpm-butterflies-128) checkpoint to generate images of butterflies. The [`DiffusionPipeline`] downloads and caches all the model components required to generate an image.

Start by creating an instance of [`DiffusionPipeline`] and specify which pipeline checkpoint you would like to download.
You can use any of the 🧨 Diffusers [checkpoints](https://huggingface.co/models?library=diffusers&sort=downloads) from the Hub (the checkpoint you'll use generates images of butterflies).
```py
from diffusers import DiffusionPipeline

generator = DiffusionPipeline.from_pretrained("anton-l/ddpm-butterflies-128").to("cuda")
image = generator().images[0]
image
```

<Tip>

💡 Want to train your own unconditional image generation model? Take a look at the training [guide](../training/unconditional_training) to learn how to generate your own images.
Want to generate images of something else? Take a look at the training [guide](../training/unconditional_training) to learn how to train a model to generate your own images.

</Tip>

In this guide, you'll use [`DiffusionPipeline`] for unconditional image generation with [DDPM](https://arxiv.org/abs/2006.11239):

```python
from diffusers import DiffusionPipeline

generator = DiffusionPipeline.from_pretrained("anton-l/ddpm-butterflies-128", use_safetensors=True)
```
The output image is a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) object that can be saved:

The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components.
Because the model consists of roughly 1.4 billion parameters, we strongly recommend running it on a GPU.
You can move the generator object to a GPU, just like you would in PyTorch:

```python
generator.to("cuda")
```py
image.save("generated_image.png")
```

Now you can use the `generator` to generate an image:
You can also try experimenting with the `num_inference_steps` parameter, which controls the number of denoising steps. More denoising steps typically produce higher quality images, but it'll take longer to generate. Feel free to play around with this parameter to see how it affects the image quality.

```python
image = generator().images[0]
```py
image = generator(num_inference_steps=100).images[0]
image
```

The output is by default wrapped into a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) object.

You can save the image by calling:

```python
image.save("generated_image.png")
```

Try out the Spaces below, and feel free to play around with the inference steps parameter to see how it affects the image quality!
Try out the Space below to generate an image of a butterfly!

<iframe
src="https://stevhliu-ddpm-butterflies-128.hf.space"
src="https://stevhliu-unconditional-image-generation.hf.space"
frameborder="0"
width="850"
height="500"
Expand Down
Loading
Loading