Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge changes #175

Merged
merged 41 commits into from
Aug 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
1a92bc0
Add Learned PE selection for Auraflow (#9182)
cloneofsimo Aug 15, 2024
3e46043
Small improvements for video loading (#9183)
DN6 Aug 16, 2024
39b87b1
feat: allow flux transformer to be sharded during inference (#9159)
sayakpaul Aug 16, 2024
e649678
[Flux] Optimize guidance creation in flux pipeline by moving it outsi…
chengzeyi Aug 16, 2024
e780c05
[Chore] add set_default_attn_processor to pixart. (#9196)
sayakpaul Aug 16, 2024
db829a4
[IP Adapter] Fix object has no attribute with image encoder (#9194)
asomoza Aug 17, 2024
cba548d
fix(pipeline): k sampler sigmas device (#9189)
Jannchie Aug 17, 2024
b382550
Add Lumina T2I Auto Pipe Mapping (#8962)
Beinsezii Aug 17, 2024
f848feb
feat: allow sharding for auraflow. (#8853)
sayakpaul Aug 18, 2024
7ef8a46
[`Docs`] Fix CPU offloading usage (#9207)
tolgacangoz Aug 18, 2024
d25eb5d
fix(sd3): fix deletion of text_encoders etc (#8951)
townwish4git Aug 19, 2024
ba4348d
[Tests] Improve transformers model test suite coverage - Lumina (#8987)
saqlain2204 Aug 19, 2024
815d882
Add loading text inversion (#9130)
waylongo Aug 19, 2024
b2add10
Update `is_safetensors_compatible` check (#8991)
DN6 Aug 19, 2024
940b8e0
[CI] Multiple Slow Test fixes. (#9198)
DN6 Aug 19, 2024
9ab80a9
[CI] Add `fail-fast=False` to CUDA nightly and slow tests (#9214)
DN6 Aug 19, 2024
d72bbc6
Reflect few contributions on `contribution.md` that were not reflecte…
mreraser Aug 19, 2024
67f5cce
fix autopipeline for kolors img2img (#9212)
yiyixuxu Aug 19, 2024
803e817
Add vae slicing and tiling to flux pipeline (#9122)
iamzoltan Aug 19, 2024
eda36c4
Fix dtype error for StableDiffusionXL (#9217)
leisuzz Aug 20, 2024
cf2c49b
Remove M1 runner from Nightly Test (#9193)
DN6 Aug 20, 2024
214990e
Fix ```from_single_file``` for xl_inpaint (#9054)
Gothos Aug 20, 2024
21682ba
Custom sampler support for Stable Cascade Decoder (#9132)
Disty0 Aug 20, 2024
16a3dad
Fix StableDiffusionXLPAGInpaintPipeline (#9128)
gumgood Aug 20, 2024
867e0c9
StableDiffusionLatentUpscalePipeline - positive/negative prompt embed…
rootonchair Aug 21, 2024
214372a
fix a regression in `is_safetensors_compatible` (#9234)
yiyixuxu Aug 21, 2024
750bd79
[Single File] Fix configuring scheduler via legacy kwargs (#9229)
DN6 Aug 21, 2024
9003d75
Add StableDiffusionXLControlNetPAGImg2ImgPipeline (#8990)
satani99 Aug 21, 2024
c291617
Flux followup (#9074)
yiyixuxu Aug 21, 2024
43f1090
[docs] Network alpha docstring (#9238)
stevhliu Aug 22, 2024
32d6492
[Core] Tear apart `from_pretrained()` of `DiffusionPipeline` (#8967)
sayakpaul Aug 22, 2024
5090b09
[Flux LoRA] support parsing alpha from a flux lora state dict. (#9236)
sayakpaul Aug 22, 2024
0ec64fe
[tests] fix broken xformers tests (#9206)
a-r-r-o-w Aug 22, 2024
805bf33
Docs fix spelling issues (#9219)
nnsW3 Aug 22, 2024
dc07fc2
fix _identify_model_variants (#9247)
yiyixuxu Aug 22, 2024
960c149
Cogvideox-5B Model adapter change (#9203)
zRzRzRzRzRzRzR Aug 23, 2024
2d9ccf3
[Core] fuse_qkv_projection() to Flux (#9185)
sayakpaul Aug 23, 2024
255ac59
[Single File] Support loading Comfy UI Flux checkpoints (#9243)
DN6 Aug 23, 2024
4e74206
[Single File] Add Flux Pipeline Support (#9244)
DN6 Aug 23, 2024
4e66513
[CI] Run Fast + Fast GPU Tests on release branches. (#9255)
DN6 Aug 23, 2024
77b2162
Bugfix in `pipeline_kandinsky2_2_combined.py`: Image type check misma…
yangpei-comp Aug 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
177 changes: 116 additions & 61 deletions .github/workflows/nightly_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ jobs:
run:
shell: bash
strategy:
fail-fast: false
max-parallel: 2
matrix:
module: [models, schedulers, lora, others, single_file, examples]
Expand Down Expand Up @@ -290,64 +291,118 @@ jobs:
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

run_nightly_tests_apple_m1:
name: Nightly PyTorch MPS tests on MacOS
runs-on: [ self-hosted, apple-m1 ]
if: github.event_name == 'schedule'

steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2

- name: Clean checkout
shell: arch -arch arm64 bash {0}
run: |
git clean -fxd

- name: Setup miniconda
uses: ./.github/actions/setup-miniconda
with:
python-version: 3.9

- name: Install dependencies
shell: arch -arch arm64 bash {0}
run: |
${CONDA_RUN} python -m pip install --upgrade pip uv
${CONDA_RUN} python -m uv pip install -e [quality,test]
${CONDA_RUN} python -m uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
${CONDA_RUN} python -m uv pip install pytest-reportlog

- name: Environment
shell: arch -arch arm64 bash {0}
run: |
${CONDA_RUN} python utils/print_env.py

- name: Run nightly PyTorch tests on M1 (MPS)
shell: arch -arch arm64 bash {0}
env:
HF_HOME: /System/Volumes/Data/mnt/cache
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
--report-log=tests_torch_mps.log \
tests/

- name: Failure short reports
if: ${{ failure() }}
run: cat reports/tests_torch_mps_failures_short.txt

- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: torch_mps_test_reports
path: reports

- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
# M1 runner currently not well supported
# TODO: (Dhruv) add these back when we setup better testing for Apple Silicon
# run_nightly_tests_apple_m1:
# name: Nightly PyTorch MPS tests on MacOS
# runs-on: [ self-hosted, apple-m1 ]
# if: github.event_name == 'schedule'
#
# steps:
# - name: Checkout diffusers
# uses: actions/checkout@v3
# with:
# fetch-depth: 2
#
# - name: Clean checkout
# shell: arch -arch arm64 bash {0}
# run: |
# git clean -fxd
# - name: Setup miniconda
# uses: ./.github/actions/setup-miniconda
# with:
# python-version: 3.9
#
# - name: Install dependencies
# shell: arch -arch arm64 bash {0}
# run: |
# ${CONDA_RUN} python -m pip install --upgrade pip uv
# ${CONDA_RUN} python -m uv pip install -e [quality,test]
# ${CONDA_RUN} python -m uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
# ${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
# ${CONDA_RUN} python -m uv pip install pytest-reportlog
# - name: Environment
# shell: arch -arch arm64 bash {0}
# run: |
# ${CONDA_RUN} python utils/print_env.py
# - name: Run nightly PyTorch tests on M1 (MPS)
# shell: arch -arch arm64 bash {0}
# env:
# HF_HOME: /System/Volumes/Data/mnt/cache
# HF_TOKEN: ${{ secrets.HF_TOKEN }}
# run: |
# ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
# --report-log=tests_torch_mps.log \
# tests/
# - name: Failure short reports
# if: ${{ failure() }}
# run: cat reports/tests_torch_mps_failures_short.txt
#
# - name: Test suite reports artifacts
# if: ${{ always() }}
# uses: actions/upload-artifact@v2
# with:
# name: torch_mps_test_reports
# path: reports
#
# - name: Generate Report and Notify Channel
# if: always()
# run: |
# pip install slack_sdk tabulate
# python utils/log_reports.py >> $GITHUB_STEP_SUMMARY run_nightly_tests_apple_m1:
# name: Nightly PyTorch MPS tests on MacOS
# runs-on: [ self-hosted, apple-m1 ]
# if: github.event_name == 'schedule'
#
# steps:
# - name: Checkout diffusers
# uses: actions/checkout@v3
# with:
# fetch-depth: 2
#
# - name: Clean checkout
# shell: arch -arch arm64 bash {0}
# run: |
# git clean -fxd
# - name: Setup miniconda
# uses: ./.github/actions/setup-miniconda
# with:
# python-version: 3.9
#
# - name: Install dependencies
# shell: arch -arch arm64 bash {0}
# run: |
# ${CONDA_RUN} python -m pip install --upgrade pip uv
# ${CONDA_RUN} python -m uv pip install -e [quality,test]
# ${CONDA_RUN} python -m uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
# ${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
# ${CONDA_RUN} python -m uv pip install pytest-reportlog
# - name: Environment
# shell: arch -arch arm64 bash {0}
# run: |
# ${CONDA_RUN} python utils/print_env.py
# - name: Run nightly PyTorch tests on M1 (MPS)
# shell: arch -arch arm64 bash {0}
# env:
# HF_HOME: /System/Volumes/Data/mnt/cache
# HF_TOKEN: ${{ secrets.HF_TOKEN }}
# run: |
# ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
# --report-log=tests_torch_mps.log \
# tests/
# - name: Failure short reports
# if: ${{ failure() }}
# run: cat reports/tests_torch_mps_failures_short.txt
#
# - name: Test suite reports artifacts
# if: ${{ always() }}
# uses: actions/upload-artifact@v2
# with:
# name: torch_mps_test_reports
# path: reports
#
# - name: Generate Report and Notify Channel
# if: always()
# run: |
# pip install slack_sdk tabulate
# python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
6 changes: 5 additions & 1 deletion .github/workflows/push_tests.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
name: Slow Tests on main
name: Fast GPU Tests on main

on:
push:
branches:
- main
- "v*.*.*-release"
- "v*.*.*-patch"
paths:
- "src/diffusers/**.py"
- "examples/**.py"
Expand Down Expand Up @@ -112,6 +114,8 @@ jobs:
run:
shell: bash
strategy:
fail-fast: false
max-parallel: 2
matrix:
module: [models, schedulers, lora, others, single_file]
steps:
Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Any question or comment related to the Diffusers library can be asked on the [di
- ...

Every question that is asked on the forum or on Discord actively encourages the community to publicly
share knowledge and might very well help a beginner in the future that has the same question you're
share knowledge and might very well help a beginner in the future who has the same question you're
having. Please do pose any questions you might have.
In the same spirit, you are of immense help to the community by answering such questions because this way you are publicly documenting knowledge for everybody to learn from.

Expand Down Expand Up @@ -503,4 +503,4 @@ $ git push --set-upstream origin your-branch-for-syncing

### Style guide

For documentation strings, 🧨 Diffusers follows the [Google style](https://google.github.io/styleguide/pyguide.html).
For documentation strings, 🧨 Diffusers follows the [Google style](https://google.github.io/styleguide/pyguide.html).
4 changes: 2 additions & 2 deletions PHILOSOPHY.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ specific language governing permissions and limitations under the License.
🧨 Diffusers provides **state-of-the-art** pretrained diffusion models across multiple modalities.
Its purpose is to serve as a **modular toolbox** for both inference and training.

We aim at building a library that stands the test of time and therefore take API design very seriously.
We aim to build a library that stands the test of time and therefore take API design very seriously.

In a nutshell, Diffusers is built to be a natural extension of PyTorch. Therefore, most of our design choices are based on [PyTorch's Design Principles](https://pytorch.org/docs/stable/community/design.html#pytorch-design-philosophy). Let's go over the most important ones:

Expand Down Expand Up @@ -107,4 +107,4 @@ The following design principles are followed:
- Every scheduler exposes the timesteps to be "looped over" via a `timesteps` attribute, which is an array of timesteps the model will be called upon.
- The `step(...)` function takes a predicted model output and the "current" sample (x_t) and returns the "previous", slightly more denoised sample (x_t-1).
- Given the complexity of diffusion schedulers, the `step` function does not expose all the complexity and can be a bit of a "black box".
- In almost all cases, novel schedulers shall be implemented in a new scheduling file.
- In almost all cases, novel schedulers shall be implemented in a new scheduling file.
6 changes: 5 additions & 1 deletion docs/source/en/api/pipelines/cogvideox.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.m

This pipeline was contributed by [zRzRzRzRzRzRzR](https://github.com/zRzRzRzRzRzRzR). The original codebase can be found [here](https://huggingface.co/THUDM). The original weights can be found under [hf.co/THUDM](https://huggingface.co/THUDM).

There are two models available that can be used with the CogVideoX pipeline:
- [`THUDM/CogVideoX-2b`](https://huggingface.co/THUDM/CogVideoX-2b)
- [`THUDM/CogVideoX-5b`](https://huggingface.co/THUDM/CogVideoX-5b)

## Inference

Use [`torch.compile`](https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion#torchcompile) to reduce the inference latency.
Expand Down Expand Up @@ -68,7 +72,7 @@ With torch.compile(): Average inference time: 76.27 seconds.

### Memory optimization

CogVideoX requires about 19 GB of GPU memory to decode 49 frames (6 seconds of video at 8 FPS) with output resolution 720x480 (W x H), which makes it not possible to run on consumer GPUs or free-tier T4 Colab. The following memory optimizations could be used to reduce the memory footprint. For replication, you can refer to [this](https://gist.github.com/a-r-r-o-w/3959a03f15be5c9bd1fe545b09dfcc93) script.
CogVideoX-2b requires about 19 GB of GPU memory to decode 49 frames (6 seconds of video at 8 FPS) with output resolution 720x480 (W x H), which makes it not possible to run on consumer GPUs or free-tier T4 Colab. The following memory optimizations could be used to reduce the memory footprint. For replication, you can refer to [this](https://gist.github.com/a-r-r-o-w/3959a03f15be5c9bd1fe545b09dfcc93) script.

- `pipe.enable_model_cpu_offload()`:
- Without enabling cpu offloading, memory usage is `33 GB`
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/api/pipelines/controlnet_sd3.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ The abstract from the paper is:

*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*

This controlnet code is mainly implemented by [The InstantX Team](https://huggingface.co/InstantX). The inpainting-related code was developed by [The Alimama Creative Team](https://huggingface.co/alimama-creative). You can find pre-trained checkpoints for SD3-ControlNet in the table below:
This controlnet code is mainly implemented by [The InstantX Team](https://huggingface.co/InstantX). The inpainting-related code was developed by [The Alimama Creative Team](https://huggingface.co/alimama-creative). You can find pre-trained checkpoints for SD3-ControlNet in the table below:


| ControlNet type | Developer | Link |
Expand Down
4 changes: 2 additions & 2 deletions docs/source/en/api/pipelines/kolors.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.

![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/kolors/kolors_header_collage.png)

Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by [the Kuaishou Kolors team](kwai-kolors@kuaishou.com). Trained on billions of text-image pairs, Kolors exhibits significant advantages over both open-source and closed-source models in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. Furthermore, Kolors supports both Chinese and English inputs, demonstrating strong performance in understanding and generating Chinese-specific content. For more details, please refer to this [technical report](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf).
Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by [the Kuaishou Kolors team](https://github.com/Kwai-Kolors/Kolors). Trained on billions of text-image pairs, Kolors exhibits significant advantages over both open-source and closed-source models in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. Furthermore, Kolors supports both Chinese and English inputs, demonstrating strong performance in understanding and generating Chinese-specific content. For more details, please refer to this [technical report](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf).

The abstract from the technical report is:

Expand Down Expand Up @@ -74,7 +74,7 @@ image_encoder = CLIPVisionModelWithProjection.from_pretrained(

pipe = KolorsPipeline.from_pretrained(
"Kwai-Kolors/Kolors-diffusers", image_encoder=image_encoder, torch_dtype=torch.float16, variant="fp16"
).to("cuda")
)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True)

pipe.load_ip_adapter(
Expand Down
8 changes: 6 additions & 2 deletions docs/source/en/api/pipelines/pag.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ The abstract from the paper is:

*Recent studies have demonstrated that diffusion models are capable of generating high-quality samples, but their quality heavily depends on sampling guidance techniques, such as classifier guidance (CG) and classifier-free guidance (CFG). These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration. In this paper, we propose a novel sampling guidance, called Perturbed-Attention Guidance (PAG), which improves diffusion sample quality across both unconditional and conditional settings, achieving this without requiring additional training or the integration of external modules. PAG is designed to progressively enhance the structure of samples throughout the denoising process. It involves generating intermediate samples with degraded structure by substituting selected self-attention maps in diffusion U-Net with an identity matrix, by considering the self-attention mechanisms' ability to capture structural information, and guiding the denoising process away from these degraded samples. In both ADM and Stable Diffusion, PAG surprisingly improves sample quality in conditional and even unconditional scenarios. Moreover, PAG significantly improves the baseline performance in various downstream tasks where existing guidances such as CG or CFG cannot be fully utilized, including ControlNet with empty prompts and image restoration such as inpainting and deblurring.*

PAG can be used by specifying the `pag_applied_layers` as a parameter when instantiating a PAG pipeline. It can be a single string or a list of strings. Each string can be a unique layer identifier or a regular expression to identify one or more layers.
PAG can be used by specifying the `pag_applied_layers` as a parameter when instantiating a PAG pipeline. It can be a single string or a list of strings. Each string can be a unique layer identifier or a regular expression to identify one or more layers.

- Full identifier as a normal string: `down_blocks.2.attentions.0.transformer_blocks.0.attn1.processor`
- Full identifier as a RegEx: `down_blocks.2.(attentions|motion_modules).0.transformer_blocks.0.attn1.processor`
Expand All @@ -46,7 +46,7 @@ Since RegEx is supported as a way for matching layer identifiers, it is crucial
## KolorsPAGPipeline
[[autodoc]] KolorsPAGPipeline
- all
- __call__
- __call__

## StableDiffusionPAGPipeline
[[autodoc]] StableDiffusionPAGPipeline
Expand Down Expand Up @@ -78,6 +78,10 @@ Since RegEx is supported as a way for matching layer identifiers, it is crucial
- all
- __call__

## StableDiffusionXLControlNetPAGImg2ImgPipeline
[[autodoc]] StableDiffusionXLControlNetPAGImg2ImgPipeline
- all
- __call__

## StableDiffusion3PAGPipeline
[[autodoc]] StableDiffusion3PAGPipeline
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/stable_diffusion.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ Pretty impressive! Let's tweak the second image - corresponding to the `Generato
```python
prompts = [
"portrait photo of the oldest warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta",
"portrait photo of a old warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta",
"portrait photo of an old warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta",
"portrait photo of a warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta",
"portrait photo of a young warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta",
]
Expand Down
Loading