Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge changes #123

Merged
merged 31 commits into from
Nov 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
6a89a6c
Update custom diffusion attn processor (#5663)
DN6 Nov 7, 2023
71f56c7
Model tests xformers fixes (#5679)
DN6 Nov 7, 2023
8ca179a
Update free model hooks (#5680)
DN6 Nov 7, 2023
414d7c4
Fix Basic Transformer Block (#5683)
DN6 Nov 7, 2023
97c8199
Explicit torch/flax dependency check (#5673)
DN6 Nov 7, 2023
a8523bf
[PixArt-Alpha] fix `mask_feature` so that precomputed embeddings work…
sayakpaul Nov 7, 2023
84cd9e8
Make sure DDPM and `diffusers` can be used without Transformers (#5668)
sayakpaul Nov 7, 2023
1dc231d
[PixArt-Alpha] Support non-square images (#5672)
sayakpaul Nov 7, 2023
aab6de2
Improve LCMScheduler (#5681)
dg845 Nov 7, 2023
7942bb8
[`Docs`] Fix typos, improve, update at Using Diffusers' Task page (#5…
tolgacangoz Nov 7, 2023
9ae9059
Replacing the nn.Mish activation function with a get_activation funct…
hi-sushanta Nov 8, 2023
6999693
speed up Shap-E fast test (#5686)
yiyixuxu Nov 8, 2023
11c1256
Fix the misaligned pipeline usage in dreamshaper docstrings (#5700)
kirill-fedyanin Nov 8, 2023
d384265
Fixed is_safetensors_compatible() handling of windows path separators…
PhilLab Nov 8, 2023
c803a8f
[LCM] Fix img2img (#5698)
patrickvonplaten Nov 8, 2023
78be400
[PixArt-Alpha] fix mask feature condition. (#5695)
sayakpaul Nov 8, 2023
17528af
Fix styling issues (#5699)
patrickvonplaten Nov 8, 2023
6e68c71
Add adapter fusing + PEFT to the docs (#5662)
apolinario Nov 8, 2023
65ef7a0
Fix prompt bug in AnimateDiff (#5702)
DN6 Nov 8, 2023
6110d7c
[Bugfix] fix error of peft lora when xformers enabled (#5697)
okotaku Nov 8, 2023
43346ad
Install accelerate from PyPI in PR test runner (#5721)
DN6 Nov 9, 2023
2fd4640
consistency decoder (#5694)
williamberman Nov 9, 2023
bf406ea
Correct consist dec (#5722)
patrickvonplaten Nov 9, 2023
3d7eaf8
LCM Add Tests (#5707)
patrickvonplaten Nov 9, 2023
bc2ba00
[LCM] add: locm docs. (#5723)
sayakpaul Nov 9, 2023
db2d8e7
Add LCM Scripts (#5727)
patil-suraj Nov 9, 2023
53a8439
[`Docs`] Fix typos and update files at Optimization Page (#5674)
tolgacangoz Nov 9, 2023
1328aeb
[Docs] Clarify that these are two separate examples (#5734)
up_the_irons Nov 9, 2023
77ba494
[ConsistencyDecoder] fix: doc type (#5745)
sayakpaul Nov 10, 2023
1f87f83
add load_datasete data_dir parameter (#5747)
aihao2000 Nov 10, 2023
1477865
post release v0.23.0 (#5730)
sayakpaul Nov 10, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions .github/workflows/pr_flax_dependency_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: Run Flax dependency tests

on:
pull_request:
branches:
- main
push:
branches:
- main

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
check_flax_dependencies:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.8"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e .
pip install "jax[cpu]>=0.2.16,!=0.3.2"
pip install "flax>=0.4.1"
pip install "jaxlib>=0.1.65"
pip install pytest
- name: Check for soft dependencies
run: |
pytest tests/others/test_dependencies.py
4 changes: 2 additions & 2 deletions .github/workflows/pr_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ jobs:
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m pip install -e .[quality,test]
python -m pip install git+https://github.com/huggingface/accelerate.git
python -m pip install accelerate

- name: Environment
run: |
Expand Down Expand Up @@ -115,7 +115,7 @@ jobs:
run: |
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
--make-reports=tests_${{ matrix.config.report }} \
examples/test_examples.py
examples/test_examples.py

- name: Failure short reports
if: ${{ failure() }}
Expand Down
32 changes: 32 additions & 0 deletions .github/workflows/pr_torch_dependency_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: Run Torch dependency tests

on:
pull_request:
branches:
- main
push:
branches:
- main

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
check_torch_dependencies:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.8"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e .
pip install torch torchvision torchaudio
pip install pytest
- name: Check for soft dependencies
run: |
pytest tests/others/test_dependencies.py
6 changes: 3 additions & 3 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ limitations under the License.

# Generating the documentation

To generate the documentation, you first have to build it. Several packages are necessary to build the doc,
To generate the documentation, you first have to build it. Several packages are necessary to build the doc,
you can install them with the following command, at the root of the code repository:

```bash
Expand Down Expand Up @@ -142,7 +142,7 @@ This will include every public method of the pipeline that is documented, as wel
- __call__
- enable_attention_slicing
- disable_attention_slicing
- enable_xformers_memory_efficient_attention
- enable_xformers_memory_efficient_attention
- disable_xformers_memory_efficient_attention
```

Expand All @@ -154,7 +154,7 @@ Values that should be put in `code` should either be surrounded by backticks: \`
and objects like True, None, or any strings should usually be put in `code`.

When mentioning a class, function, or method, it is recommended to use our syntax for internal links so that our tool
adds a link to its documentation with this syntax: \[\`XXXClass\`\] or \[\`function\`\]. This requires the class or
adds a link to its documentation with this syntax: \[\`XXXClass\`\] or \[\`function\`\]. This requires the class or
function to be in the main package.

If you want to create a link to some internal class or function, you need to
Expand Down
2 changes: 1 addition & 1 deletion docs/TRANSLATING.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Here, `LANG-ID` should be one of the ISO 639-1 or ISO 639-2 language codes -- se

The fun part comes - translating the text!

The first thing we recommend is translating the part of the `_toctree.yml` file that corresponds to your doc chapter. This file is used to render the table of contents on the website.
The first thing we recommend is translating the part of the `_toctree.yml` file that corresponds to your doc chapter. This file is used to render the table of contents on the website.

> 🙋 If the `_toctree.yml` file doesn't yet exist for your language, you can create one by copy-pasting from the English version and deleting the sections unrelated to your chapter. Just make sure it exists in the `docs/source/LANG-ID/` directory!

Expand Down
8 changes: 7 additions & 1 deletion docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@
title: Overview
- local: using-diffusers/sdxl
title: Stable Diffusion XL
- local: using-diffusers/lcm
title: Latent Consistency Models
- local: using-diffusers/kandinsky
title: Kandinsky
- local: using-diffusers/controlnet
Expand Down Expand Up @@ -133,7 +135,7 @@
- local: optimization/memory
title: Reduce memory usage
- local: optimization/torch2.0
title: Torch 2.0
title: PyTorch 2.0
- local: optimization/xformers
title: xFormers
- local: optimization/tome
Expand Down Expand Up @@ -200,6 +202,8 @@
title: AsymmetricAutoencoderKL
- local: api/models/autoencoder_tiny
title: Tiny AutoEncoder
- local: api/models/consistency_decoder_vae
title: ConsistencyDecoderVAE
- local: api/models/transformer2d
title: Transformer2D
- local: api/models/transformer_temporal
Expand Down Expand Up @@ -344,6 +348,8 @@
title: Overview
- local: api/schedulers/cm_stochastic_iterative
title: CMStochasticIterativeScheduler
- local: api/schedulers/consistency_decoder
title: ConsistencyDecoderScheduler
- local: api/schedulers/ddim_inverse
title: DDIMInverseScheduler
- local: api/schedulers/ddim
Expand Down
18 changes: 18 additions & 0 deletions docs/source/en/api/models/consistency_decoder_vae.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Consistency Decoder

Consistency decoder can be used to decode the latents from the denoising UNet in the [`StableDiffusionPipeline`]. This decoder was introduced in the [DALL-E 3 technical report](https://openai.com/dall-e-3).

The original codebase can be found at [openai/consistencydecoder](https://github.com/openai/consistencydecoder).

<Tip warning={true}>

Inference is only supported for 2 iterations as of now.

</Tip>

The pipeline could not have been contributed without the help of [madebyollin](https://github.com/madebyollin) and [mrsteyk](https://github.com/mrsteyk) from [this issue](https://github.com/openai/consistencydecoder/issues/1).

## ConsistencyDecoderVAE
[[autodoc]] ConsistencyDecoderVAE
- all
- decode
9 changes: 9 additions & 0 deletions docs/source/en/api/schedulers/consistency_decoder.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# ConsistencyDecoderScheduler

This scheduler is a part of the [`ConsistencyDecoderPipeline`] and was introduced in [DALL-E 3](https://openai.com/dall-e-3).

The original codebase can be found at [openai/consistency_models](https://github.com/openai/consistency_models).


## ConsistencyDecoderScheduler
[[autodoc]] schedulers.scheduling_consistency_decoder.ConsistencyDecoderScheduler
6 changes: 3 additions & 3 deletions docs/source/en/conceptual/ethical_guidelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.

## Preamble

[Diffusers](https://huggingface.co/docs/diffusers/index) provides pre-trained diffusion models and serves as a modular toolbox for inference and training.
[Diffusers](https://huggingface.co/docs/diffusers/index) provides pre-trained diffusion models and serves as a modular toolbox for inference and training.

Given its real case applications in the world and potential negative impacts on society, we think it is important to provide the project with ethical guidelines to guide the development, users’ contributions, and usage of the Diffusers library.

Expand Down Expand Up @@ -46,7 +46,7 @@ The following ethical guidelines apply generally, but we will primarily implemen

## Examples of implementations: Safety features and Mechanisms

The team works daily to make the technical and non-technical tools available to deal with the potential ethical and social risks associated with diffusion technology. Moreover, the community's input is invaluable in ensuring these features' implementation and raising awareness with us.
The team works daily to make the technical and non-technical tools available to deal with the potential ethical and social risks associated with diffusion technology. Moreover, the community's input is invaluable in ensuring these features' implementation and raising awareness with us.

- [**Community tab**](https://huggingface.co/docs/hub/repositories-pull-requests-discussions): it enables the community to discuss and better collaborate on a project.

Expand All @@ -60,4 +60,4 @@ The team works daily to make the technical and non-technical tools available to

- **Staged released on the Hub**: in particularly sensitive situations, access to some repositories should be restricted. This staged release is an intermediary step that allows the repository’s authors to have more control over its use.

- **Licensing**: [OpenRAILs](https://huggingface.co/blog/open_rail), a new type of licensing, allow us to ensure free access while having a set of restrictions that ensure more responsible use.
- **Licensing**: [OpenRAILs](https://huggingface.co/blog/open_rail), a new type of licensing, allow us to ensure free access while having a set of restrictions that ensure more responsible use.
36 changes: 18 additions & 18 deletions docs/source/en/conceptual/evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ specific language governing permissions and limitations under the License.

# Evaluating Diffusion Models

<a target="_blank" href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/evaluation.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>
<a target="_blank" href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/evaluation.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Evaluation of generative models like [Stable Diffusion](https://huggingface.co/docs/diffusers/stable_diffusion) is subjective in nature. But as practitioners and researchers, we often have to make careful choices amongst many different possibilities. So, when working with different generative models (like GANs, Diffusion, etc.), how do we choose one over the other?

Expand All @@ -23,7 +23,7 @@ However, quantitative metrics don't necessarily correspond to image quality. So,
of both qualitative and quantitative evaluations provides a stronger signal when choosing one model
over the other.

In this document, we provide a non-exhaustive overview of qualitative and quantitative methods to evaluate Diffusion models. For quantitative methods, we specifically focus on how to implement them alongside `diffusers`.
In this document, we provide a non-exhaustive overview of qualitative and quantitative methods to evaluate Diffusion models. For quantitative methods, we specifically focus on how to implement them alongside `diffusers`.

The methods shown in this document can also be used to evaluate different [noise schedulers](https://huggingface.co/docs/diffusers/main/en/api/schedulers/overview) keeping the underlying generation model fixed.

Expand All @@ -38,9 +38,9 @@ We cover Diffusion models with the following pipelines:
## Qualitative Evaluation

Qualitative evaluation typically involves human assessment of generated images. Quality is measured across aspects such as compositionality, image-text alignment, and spatial relations. Common prompts provide a degree of uniformity for subjective metrics.
DrawBench and PartiPrompts are prompt datasets used for qualitative benchmarking. DrawBench and PartiPrompts were introduced by [Imagen](https://imagen.research.google/) and [Parti](https://parti.research.google/) respectively.
DrawBench and PartiPrompts are prompt datasets used for qualitative benchmarking. DrawBench and PartiPrompts were introduced by [Imagen](https://imagen.research.google/) and [Parti](https://parti.research.google/) respectively.

From the [official Parti website](https://parti.research.google/):
From the [official Parti website](https://parti.research.google/):

> PartiPrompts (P2) is a rich set of over 1600 prompts in English that we release as part of this work. P2 can be used to measure model capabilities across various categories and challenge aspects.

Expand All @@ -52,13 +52,13 @@ PartiPrompts has the following columns:
- Category of the prompt (such as “Abstract”, “World Knowledge”, etc.)
- Challenge reflecting the difficulty (such as “Basic”, “Complex”, “Writing & Symbols”, etc.)

These benchmarks allow for side-by-side human evaluation of different image generation models.
These benchmarks allow for side-by-side human evaluation of different image generation models.

For this, the 🧨 Diffusers team has built **Open Parti Prompts**, which is a community-driven qualitative benchmark based on Parti Prompts to compare state-of-the-art open-source diffusion models:
- [Open Parti Prompts Game](https://huggingface.co/spaces/OpenGenAI/open-parti-prompts): For 10 parti prompts, 4 generated images are shown and the user selects the image that suits the prompt best.
- [Open Parti Prompts Leaderboard](https://huggingface.co/spaces/OpenGenAI/parti-prompts-leaderboard): The leaderboard comparing the currently best open-sourced diffusion models to each other.

To manually compare images, let’s see how we can use `diffusers` on a couple of PartiPrompts.
To manually compare images, let’s see how we can use `diffusers` on a couple of PartiPrompts.

Below we show some prompts sampled across different challenges: Basic, Complex, Linguistic Structures, Imagination, and Writing & Symbols. Here we are using PartiPrompts as a [dataset](https://huggingface.co/datasets/nateraw/parti-prompts).

Expand Down Expand Up @@ -92,16 +92,16 @@ images = sd_pipeline(sample_prompts, num_images_per_prompt=1, generator=generato

![parti-prompts-14](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/parti-prompts-14.png)

We can also set `num_images_per_prompt` accordingly to compare different images for the same prompt. Running the same pipeline but with a different checkpoint ([v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)), yields:
We can also set `num_images_per_prompt` accordingly to compare different images for the same prompt. Running the same pipeline but with a different checkpoint ([v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)), yields:

![parti-prompts-15](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/parti-prompts-15.png)

Once several images are generated from all the prompts using multiple models (under evaluation), these results are presented to human evaluators for scoring. For
more details on the DrawBench and PartiPrompts benchmarks, refer to their respective papers.
more details on the DrawBench and PartiPrompts benchmarks, refer to their respective papers.

<Tip>
<Tip>

It is useful to look at some inference samples while a model is training to measure the
It is useful to look at some inference samples while a model is training to measure the
training progress. In our [training scripts](https://github.com/huggingface/diffusers/tree/main/examples/), we support this utility with additional support for
logging to TensorBoard and Weights & Biases.

Expand Down Expand Up @@ -177,7 +177,7 @@ generator = torch.manual_seed(seed)
images = sd_pipeline(prompts, num_images_per_prompt=1, generator=generator, output_type="np").images
```

Then we load the [v1-5 checkpoint](https://huggingface.co/runwayml/stable-diffusion-v1-5) to generate images:
Then we load the [v1-5 checkpoint](https://huggingface.co/runwayml/stable-diffusion-v1-5) to generate images:

```python
model_ckpt_1_5 = "runwayml/stable-diffusion-v1-5"
Expand Down Expand Up @@ -205,7 +205,7 @@ It seems like the [v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)
By construction, there are some limitations in this score. The captions in the training dataset
were crawled from the web and extracted from `alt` and similar tags associated an image on the internet.
They are not necessarily representative of what a human being would use to describe an image. Hence we
had to "engineer" some prompts here.
had to "engineer" some prompts here.

</Tip>

Expand Down Expand Up @@ -551,15 +551,15 @@ FID results tend to be fragile as they depend on a lot of factors:
* The implementation accuracy of the computation.
* The image format (not the same if we start from PNGs vs JPGs).

Keeping that in mind, FID is often most useful when comparing similar runs, but it is
hard to reproduce paper results unless the authors carefully disclose the FID
Keeping that in mind, FID is often most useful when comparing similar runs, but it is
hard to reproduce paper results unless the authors carefully disclose the FID
measurement code.

These points apply to other related metrics too, such as KID and IS.
These points apply to other related metrics too, such as KID and IS.

</Tip>

As a final step, let's visually inspect the `fake_images`.
As a final step, let's visually inspect the `fake_images`.

<p align="center">
<img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/fake-images.png" alt="fake-images"><br>
Expand Down
Loading
Loading