diff --git a/.github/ISSUE_TEMPLATE/error-report.md b/.github/ISSUE_TEMPLATE/error-report.md index 70b8d68b7..48667714d 100644 --- a/.github/ISSUE_TEMPLATE/error-report.md +++ b/.github/ISSUE_TEMPLATE/error-report.md @@ -4,7 +4,6 @@ about: Create a report to help us improve title: '' labels: '' assignees: '' - --- Thanks for your error report and we appreciate it a lot. @@ -33,8 +32,8 @@ A placeholder for the command. 1. Please run `python mmgen/utils/collect_env.py` to collect necessary environment information and paste it here. 2. You may add addition that may be helpful for locating the problem, such as - - How you installed PyTorch [e.g., pip, conda, source] - - Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.) + - How you installed PyTorch \[e.g., pip, conda, source\] + - Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.) **Error traceback** If applicable, paste the error trackback here. diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md index 33f9d5f23..7bf92e8c9 100644 --- a/.github/ISSUE_TEMPLATE/feature_request.md +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -4,15 +4,14 @@ about: Suggest an idea for this project title: '' labels: '' assignees: '' - --- **Describe the feature** **Motivation** A clear and concise description of the motivation of the feature. -Ex1. It is inconvenient when [....]. -Ex2. There is a recent paper [....], which is very helpful for [....]. +Ex1. It is inconvenient when \[....\]. +Ex2. There is a recent paper \[....\], which is very helpful for \[....\]. **Related resources** If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful. diff --git a/.github/ISSUE_TEMPLATE/general_questions.md b/.github/ISSUE_TEMPLATE/general_questions.md index b5a6451a6..f02dd63a8 100644 --- a/.github/ISSUE_TEMPLATE/general_questions.md +++ b/.github/ISSUE_TEMPLATE/general_questions.md @@ -4,5 +4,4 @@ about: Ask general questions to get help title: '' labels: '' assignees: '' - --- diff --git a/.github/ISSUE_TEMPLATE/reimplementation_questions.md b/.github/ISSUE_TEMPLATE/reimplementation_questions.md index 316273d50..d1d00b509 100644 --- a/.github/ISSUE_TEMPLATE/reimplementation_questions.md +++ b/.github/ISSUE_TEMPLATE/reimplementation_questions.md @@ -2,9 +2,8 @@ name: Reimplementation Questions about: Ask about questions during model reimplementation title: '' -labels: 'reimplementation' +labels: reimplementation assignees: '' - --- **Notice** @@ -52,7 +51,7 @@ A placeholder for the config. 1. Please run `python mmgen/utils/collect_env.py` to collect necessary environment information and paste it here. 2. You may add addition that may be helpful for locating the problem, such as - 1. How you installed PyTorch [e.g., pip, conda, source] + 1. How you installed PyTorch \[e.g., pip, conda, source\] 2. Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.) **Results** diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 4eef89c80..4b62fcda6 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -29,17 +29,20 @@ repos: hooks: - id: docformatter args: ["--in-place", "--wrap-descriptions", "79"] - - repo: https://github.com/markdownlint/markdownlint - rev: v0.11.0 - hooks: - - id: markdownlint - args: ["-r", "~MD002,~MD013,~MD029,~MD033,~MD034", - "-t", "allow_different_nesting"] + - repo: https://github.com/executablebooks/mdformat + rev: 0.7.14 + hooks: + - id: mdformat + args: ["--number"] + additional_dependencies: + - mdformat-gfm + - mdformat_frontmatter + - linkify-it-py - repo: https://github.com/codespell-project/codespell rev: v2.1.0 hooks: - id: codespell - args: ["--skip", "*.ipynb,tools/data/hvu/label_map.json", "-L", "formating,te,nd,thre,Gool,gool,lod"] + args: ["--skip", "*.ipynb,tools/data/hvu/label_map.json,docs/zh_cn/*", "-L", "formating,te,nd,thre,Gool,gool,lod"] - repo: local hooks: - id: update-model-index diff --git a/README.md b/README.md index 1da80e361..7d4568295 100644 --- a/README.md +++ b/README.md @@ -31,7 +31,6 @@ MMGeneration is a powerful toolkit for generative models, especially for GANs no - ## Major Features - **High-quality Training Performance:** We currently support training on Unconditional GANs, Internal GANs, and Image Translation Models. Support for conditional models will come soon. @@ -39,7 +38,6 @@ MMGeneration is a powerful toolkit for generative models, especially for GANs no - **Efficient Distributed Training for Generative Models:** For the highly dynamic training in generative models, we adopt a new way to train dynamic models with `MMDDP`. ([Tutorial for DDP](docs/en/tutorials/ddp_train_gans.md)) - **New Modular Design for Flexible Combination:** A new design for complex loss modules is proposed for customizing the links between modules, which can achieve flexible combination among different modules. ([Tutorial for new modular design](docs/en/tutorials/customize_losses.md)) - @@ -73,9 +71,10 @@ MMGeneration is a powerful toolkit for generative models, especially for GANs no ## Highlight -* **Positional Encoding as Spatial Inductive Bias in GANs (CVPR2021)** has been released in `MMGeneration`. [\[Config\]](configs/positional_encoding_in_gans/README.md), [\[Project Page\]](https://nbei.github.io/gan-pos-encoding.html) -* Conditional GANs have been supported in our toolkit. More methods and pre-trained weights will come soon. -* Mixed-precision training (FP16) for StyleGAN2 has been supported. Please check [the comparison](configs/styleganv2/README.md) between different implementations. +- **Positional Encoding as Spatial Inductive Bias in GANs (CVPR2021)** has been released in `MMGeneration`. [\[Config\]](configs/positional_encoding_in_gans/README.md), [\[Project Page\]](https://nbei.github.io/gan-pos-encoding.html) +- Conditional GANs have been supported in our toolkit. More methods and pre-trained weights will come soon. +- Mixed-precision training (FP16) for StyleGAN2 has been supported. Please check [the comparison](configs/styleganv2/README.md) between different implementations. + ## Changelog v0.7.1 was released on 30/04/2022. Please refer to [changelog.md](docs/en/changelog.md) for details and release history. @@ -84,7 +83,6 @@ v0.7.1 was released on 30/04/2022. Please refer to [changelog.md](docs/en/change These methods have been carefully studied and supported in our frameworks: -
Unconditional GANs (click to collapse) @@ -133,6 +131,7 @@ These methods have been carefully studied and supported in our frameworks:
## Related-Applications + - ✅ [MMGEN-FaceStylor](https://github.com/open-mmlab/MMGEN-FaceStylor) ## License diff --git a/README_zh-CN.md b/README_zh-CN.md index 522f4e6fb..2be5f3477 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -15,7 +15,6 @@ MMGeneration 是一个基于 PyTorch 和[MMCV](https://github.com/open-mmlab/mmc - ## 主要特性 - **高质量高性能的训练:** 我们目前支持 Unconditional GANs, Internal GANs, 以及 Image Translation Models 的训练。很快将会支持 conditional models 的训练。 @@ -23,7 +22,6 @@ MMGeneration 是一个基于 PyTorch 和[MMCV](https://github.com/open-mmlab/mmc - **生成模型的高效分布式训练:** 对于生成模型中的高度动态训练,我们采用 `MMDDP` 的新方法来训练动态模型。([DDP教程](docs/tutorials/ddp_train_gans.md)) - **灵活组合的新型模块化设计:** 针对复杂的损失模块,我们提出了一种新的设计,可以自定义模块之间的链接,实现不同模块之间的灵活组合。 ([新模块化设计教程](docs/tutorials/customize_losses.md)) -
@@ -57,9 +55,10 @@ MMGeneration 是一个基于 PyTorch 和[MMCV](https://github.com/open-mmlab/mmc ## 亮点 -* **Positional Encoding as Spatial Inductive Bias in GANs (CVPR2021)** 已在 `MMGeneration` 中发布. [\[配置文件\]](configs/positional_encoding_in_gans/README.md), [\[项目主页\]](https://nbei.github.io/gan-pos-encoding.html) -* 我们已经支持训练目前主流的 Conditional GANs 模型,更多的方法和预训练权重马上就会发布,敬请期待。 -* 混合精度训练已经在 `StyleGAN2` 中进行了初步支持,请到[这里](configs/styleganv2/README.md)查看各种实现方式的详细比较。 +- **Positional Encoding as Spatial Inductive Bias in GANs (CVPR2021)** 已在 `MMGeneration` 中发布. [\[配置文件\]](configs/positional_encoding_in_gans/README.md), [\[项目主页\]](https://nbei.github.io/gan-pos-encoding.html) +- 我们已经支持训练目前主流的 Conditional GANs 模型,更多的方法和预训练权重马上就会发布,敬请期待。 +- 混合精度训练已经在 `StyleGAN2` 中进行了初步支持,请到[这里](configs/styleganv2/README.md)查看各种实现方式的详细比较。 + ## 更新日志 v0.7.1 在 30/04/2022 发布。 关于细节和发布历史,请参考 [changelog.md](docs/zh_cn/changelog.md)。 @@ -68,7 +67,6 @@ v0.7.1 在 30/04/2022 发布。 关于细节和发布历史,请参考 [changel 这些算法在我们的框架中得到了认真研究和支持。 -
Unconditional GANs (点击折叠) @@ -92,7 +90,6 @@ v0.7.1 在 30/04/2022 发布。 关于细节和发布历史,请参考 [changel - ✅ [SAGAN](configs/sagan/README.md) (ICML'2019) - ✅ [BIGGAN/BIGGAN-DEEP](configs/biggan/README.md) (ICLR'2019) -
@@ -148,6 +145,7 @@ pip3 install -e .[all] 更详细的安装指南请参考 [get_started.md](docs/zh/get_started.md) . ## 相关应用 + - ✅ [MMGEN-FaceStylor](https://github.com/open-mmlab/MMGEN-FaceStylor) ## 开源许可证 diff --git a/configs/biggan/README.md b/configs/biggan/README.md index 1af92cc70..f4076e107 100644 --- a/configs/biggan/README.md +++ b/configs/biggan/README.md @@ -11,6 +11,7 @@ Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the largest scale yet attempted, and study the instabilities specific to such scale. We find that applying orthogonal regularization to the generator renders it amenable to a simple "truncation trick," allowing fine control over the trade-off between sample fidelity and variety by reducing the variance of the Generator's input. Our modifications lead to models which set the new state of the art in class-conditional image synthesis. When trained on ImageNet at 128x128 resolution, our models (BigGANs) achieve an Inception Score (IS) of 166.5 and Frechet Inception Distance (FID) of 7.4, improving over the previous best IS of 52.52 and FID of 18.6. +
@@ -20,6 +21,7 @@ Despite recent progress in generative image modeling, successfully generating hi The `BigGAN/BigGAN-Deep` is a conditional generation model that can generate both high-resolution and high-quality images by scaling up the batch size and the number of model parameters. We have finished training `BigGAN` in `Cifar10` (32x32) and are aligning training performance in `ImageNet1k` (128x128). Some sampled results are shown below for your reference. +
Results from our BigGAN trained in CIFAR10
@@ -33,26 +35,29 @@ We have finished training `BigGAN` in `Cifar10` (32x32) and are aligning trainin
Evaluation of our trained BigGAN. -| Models | Dataset | FID (Iter) | IS (Iter) | Config | Download | -|:--------------:|:----------:|:---------------:|:--------------:|:------------------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| -| BigGAN 32x32 | CIFAR10 | 9.78(390000) | 8.70(390000) | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/biggan/biggan_cifar10_32x32_b25x2_500k.py) | [model](https://download.openmmlab.com/mmgen/biggan/biggan_cifar10_32x32_b25x2_500k_20210728_110906-08b61a44.pth)\|[log](https://download.openmmlab.com/mmgen/biggan/biggan_cifar10_32_b25x2_500k_20210706_171051.log.json) | -| BigGAN 128x128 Best FID| ImageNet1k | **8.69**(1232000) | 101.15(1232000) | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/biggan/biggan_ajbrock-sn_imagenet1k_128x128_b32x8_1500k.py) | [model](https://download.openmmlab.com/mmgen/biggan/biggan_imagenet1k_128x128_b32x8_best_fid_iter_1232000_20211111_122548-5315b13d.pth)\|[log](https://download.openmmlab.com/mmgen/biggan/biggan_imagenet1k_128x128_b32x8_1500k_20211111_122548-5315b13d.log.json) | -| BigGAN 128x128 Best IS| ImageNet1k | 13.51(1328000) | **129.07**(1328000) | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/biggan/biggan_ajbrock-sn_imagenet1k_128x128_b32x8_1500k.py) | [model](https://download.openmmlab.com/mmgen/biggan/biggan_imagenet1k_128x128_b32x8_best_is_iter_1328000_20211111_122911-28c688bc.pth)\|[log](https://download.openmmlab.com/mmgen/biggan/biggan_imagenet1k_128x128_b32x8_1500k_20211111_122548-5315b13d.log.json) | -Note: `BigGAN-Deep` trained on `ImageNet1k` will come later. + +| Models | Dataset | FID (Iter) | IS (Iter) | Config | Download | +| :----------------------------------------------------------: | :--------: | :---------------: | :-----------------: | :---------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| BigGAN 32x32 | CIFAR10 | 9.78(390000) | 8.70(390000) | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/biggan/biggan_cifar10_32x32_b25x2_500k.py) | [model](https://download.openmmlab.com/mmgen/biggan/biggan_cifar10_32x32_b25x2_500k_20210728_110906-08b61a44.pth)\|[log](https://download.openmmlab.com/mmgen/biggan/biggan_cifar10_32_b25x2_500k_20210706_171051.log.json) | +| BigGAN 128x128 Best FID | ImageNet1k | **8.69**(1232000) | 101.15(1232000) | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/biggan/biggan_ajbrock-sn_imagenet1k_128x128_b32x8_1500k.py) | [model](https://download.openmmlab.com/mmgen/biggan/biggan_imagenet1k_128x128_b32x8_best_fid_iter_1232000_20211111_122548-5315b13d.pth)\|[log](https://download.openmmlab.com/mmgen/biggan/biggan_imagenet1k_128x128_b32x8_1500k_20211111_122548-5315b13d.log.json) | +| BigGAN 128x128 Best IS | ImageNet1k | 13.51(1328000) | **129.07**(1328000) | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/biggan/biggan_ajbrock-sn_imagenet1k_128x128_b32x8_1500k.py) | [model](https://download.openmmlab.com/mmgen/biggan/biggan_imagenet1k_128x128_b32x8_best_is_iter_1328000_20211111_122911-28c688bc.pth)\|[log](https://download.openmmlab.com/mmgen/biggan/biggan_imagenet1k_128x128_b32x8_1500k_20211111_122548-5315b13d.log.json) | +| Note: `BigGAN-Deep` trained on `ImageNet1k` will come later. | | | | | | ## Converted weights + Since we haven't finished training our models, we provide you with several pre-trained weights which have been evaluated. Here, we refer to [BigGAN-PyTorch](https://github.com/ajbrock/BigGAN-PyTorch) and [pytorch-pretrained-BigGAN](https://github.com/huggingface/pytorch-pretrained-BigGAN). Evaluation results and download links are provided below. | Models | Dataset | FID | IS | Config | Download | Original Download link | -|:-------------------:|:----------:|:-------:|:-------:|:-----------------------------------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------:| +| :-----------------: | :--------: | :-----: | :-----: | :---------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------: | | BigGAN 128x128 | ImageNet1k | 10.1414 | 96.728 | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/_base_/models/biggan/biggan_128x128_cvt_BigGAN-PyTorch_rgb.py) | [model](https://download.openmmlab.com/mmgen/biggan/biggan_imagenet1k_128x128_cvt_BigGAN-PyTorch_rgb_20210730_125223-3e353fef.pth) | [link](https://drive.google.com/open?id=1nAle7FCVFZdix2--ks0r5JBkFnKw8ctW) | | BigGAN-Deep 128x128 | ImageNet1k | 5.9471 | 107.161 | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/_base_/models/biggan/biggan-deep_128x128_cvt_hugging-face_rgb.py) | [model](https://download.openmmlab.com/mmgen/biggan/biggan-deep_imagenet1k_128x128_cvt_hugging-face_rgb_20210728_111659-099e96f9.pth) | [link](https://s3.amazonaws.com/models.huggingface.co/biggan/biggan-deep-128-pytorch_model.bin) | | BigGAN-Deep 256x256 | ImageNet1k | 11.3151 | 135.107 | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/_base_/models/biggan/biggan-deep_256x256_cvt_hugging-face_rgb.py) | [model](https://download.openmmlab.com/mmgen/biggan/biggan-deep_imagenet1k_256x256_cvt_hugging-face_rgb_20210728_111735-28651569.pth) | [link](https://s3.amazonaws.com/models.huggingface.co/biggan/biggan-deep-256-pytorch_model.bin) | | BigGAN-Deep 512x512 | ImageNet1k | 16.8728 | 124.368 | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/_base_/models/biggan/biggan-deep_512x512_cvt_hugging-face_rgb.py) | [model](https://download.openmmlab.com/mmgen/biggan/biggan-deep_imagenet1k_512x512_cvt_hugging-face_rgb_20210728_112346-a42585f2.pth) | [link](https://s3.amazonaws.com/models.huggingface.co/biggan/biggan-deep-512-pytorch_model.bin) | Sampling results are shown below. +
Results from our BigGAN-Deep with Pre-trained weights in ImageNet 128x128 with truncation factor 0.4
@@ -75,19 +80,24 @@ Sampling with truncation trick above can be performed by command below. ```bash python demo/conditional_demo.py CONFIG_PATH CKPT_PATH --sample-cfg truncation=0.4 # set truncation value as you want ``` + For converted weights, we provide model configs under `configs/_base_/models` listed as follows: + ```bash # biggan_128x128_cvt_BigGAN-PyTorch_rgb.py # biggan-deep_128x128_cvt_hugging-face_rgb.py # biggan-deep_256x256_cvt_hugging-face_rgb.py # biggan-deep_512x512_cvt_hugging-face_rgb.py ``` + ## Interpolation To perform image Interpolation on BigGAN(or other conditional models), run + ```bash python apps/conditional_interpolate.py CONFIG_PATH CKPT_PATH --samples-path SAMPLES_PATH ``` +
Image interpolating Results of our BigGAN-Deep
@@ -95,9 +105,11 @@ python apps/conditional_interpolate.py CONFIG_PATH CKPT_PATH --samples-path SA
To perform image Interpolation on BigGAN with fixed noise, run + ```bash python apps/conditional_interpolate.py CONFIG_PATH CKPT_PATH --samples-path SAMPLES_PATH --fix-z ``` +
Image interpolating Results of our BigGAN-Deep with fixed noise
diff --git a/configs/cyclegan/README.md b/configs/cyclegan/README.md index d00bcbb87..ae23783ed 100644 --- a/configs/cyclegan/README.md +++ b/configs/cyclegan/README.md @@ -8,15 +8,16 @@ -Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G: X \rightarrow Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F: Y \rightarrow X and introduce a cycle consistency loss to push F(G(X)) \approx X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach. +Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G: X \\rightarrow Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F: Y \\rightarrow X and introduce a cycle consistency loss to push F(G(X)) \\approx X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach. +
- ## Results and Models +
Results from CycleGAN trained by MMGeneration
@@ -26,7 +27,7 @@ Image-to-image translation is a class of vision and graphics problems where the We use `FID` and `IS` metrics to evaluate the generation performance of CycleGAN.1 | Models | Dataset | FID | IS | Config | Download | -|:------:|:-----------------:|:--------:|:-----:|:-----------------------------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| :----: | :---------------: | :------: | :---: | :---------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | | Ours | facades | 124.8033 | 1.792 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/cyclegan/cyclegan_lsgan_resnet_in_facades_b1x1_80k.py) | [model](https://download.openmmlab.com/mmgen/cyclegan/refactor/cyclegan_lsgan_resnet_in_1x1_80k_facades_20210902_165905-5e2c0876.pth) \| [log](https://download.openmmlab.com/mmgen/cyclegan/cyclegan_lsgan_resnet_in_1x1_80k_facades_20210317_160938.log.json) 2 | | Ours | facades-id0 | 125.1694 | 1.905 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/cyclegan/cyclegan_lsgan_id0_resnet_in_facades_b1x1_80k.py) | [model](https://download.openmmlab.com/mmgen/cyclegan/refactor/cyclegan_lsgan_id0_resnet_in_1x1_80k_facades_convert-bgr_20210902_164411-d8e72b45.pth) | | Ours | summer2winter | 83.7177 | 2.771 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/cyclegan/cyclegan_lsgan_resnet_in_summer2winter_b1x1_250k.py) | [model](https://download.openmmlab.com/mmgen/cyclegan/refactor/cyclegan_lsgan_resnet_in_1x1_246200_summer2winter_convert-bgr_20210902_165932-fcf08dc1.pth) | @@ -41,18 +42,19 @@ We use `FID` and `IS` metrics to evaluate the generation performance of CycleGAN `FID` comparison with official: | Dataset | facades | facades-id0 | summer2winter | summer2winter-id0 | winter2summer | winter2summer-id0 | horse2zebra | horse2zebra-id0 | zebra2horse | zebra2horse-id0 | average | -|:--------:|:-----------:|:-----------:|:-------------:|:-----------------:|:-------------:|:-----------------:|:-----------:|:---------------:|:-----------:|:---------------:|:----------:| +| :------: | :---------: | :---------: | :-----------: | :---------------: | :-----------: | :---------------: | :---------: | :-------------: | :---------: | :-------------: | :--------: | | official | **123.626** | **119.726** | **77.342** | **76.773** | **72.631** | 74.239 | **62.111** | 77.202 | **138.646** | **137.050** | **95.935** | | ours | 124.8033 | 125.1694 | 83.7177 | 83.1418 | 72.8025 | **73.5001** | 64.5225 | **74.7770** | 141.1571 | **134.3728** | 97.79 | `IS` comparison with evaluation: | Dataset | facades | facades-id0 | summer2winter | summer2winter-id0 | winter2summer | winter2summer-id0 | horse2zebra | horse2zebra-id0 | zebra2horse | zebra2horse-id0 | average | -|:--------:|:---------:|:-----------:|:-------------:|:-----------------:|:-------------:|:-----------------:|:-----------:|:---------------:|:-----------:|:---------------:|:---------:| +| :------: | :-------: | :---------: | :-----------: | :---------------: | :-----------: | :---------------: | :---------: | :-------------: | :---------: | :-------------: | :-------: | | official | 1.638 | 1.697 | 2.762 | **2.750** | **3.293** | **3.110** | 1.375 | **1.584** | **3.186** | 3.047 | 2.444 | | ours | **1.792** | **1.905** | **2.771** | 2.720 | 3.129 | 3.107 | **1.418** | 1.542 | 3.154 | **3.091** | **2.462** | Note: + 1. With a larger identity loss, the image-to-image translation becomes more conservative, which makes less changes. The original authors did not say what is the best weight for identity loss. Thus, in addition to the default setting, we also set the weight of identity loss to 0 (denoting `id0`) to make a more comprehensive comparison. 2. This is the training log before refactoring. Updated logs will be released soon. diff --git a/configs/dcgan/README.md b/configs/dcgan/README.md index fd3c98dc9..f2d160753 100644 --- a/configs/dcgan/README.md +++ b/configs/dcgan/README.md @@ -11,6 +11,7 @@ In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations. +
@@ -23,11 +24,11 @@ In recent years, supervised learning with convolutional networks (CNNs) has seen
-| Models | Dataset | SWD | MS-SSIM | Config | Download | -|:-----------:|:--------------:|:------------------------:|:-------:|:--------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| -| DCGAN 64x64 | MNIST (64x64) | 21.16, 4.4, 8.41/11.32 | 0.1395 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/dcgan/dcgan_mnist-64_b128x1_Glr4e-4_Dlr1e-4_5k.py) | [model](https://download.openmmlab.com//mmgen/dcgan/dcgan_mnist-64_b128x1_Glr4e-4_Dlr1e-4_5k_20210512_163926-207a1eaf.pth) | [log](https://download.openmmlab.com//mmgen/dcgan/dcgan_mnist-64_b128x1_Glr4e-4_Dlr1e-4_5k_20210512_163926-207a1eaf.json) | -| DCGAN 64x64 | CelebA-Cropped | 8.93,10.53,50.32/23.26 | 0.2899 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/dcgan/dcgan_celeba-cropped_64_b128x1_300k.py) | [model](https://download.openmmlab.com/mmgen/dcgan/dcgan_celeba-cropped_64_b128x1_300kiter_20210408_161607-1f8a2277.pth) | [log](https://download.openmmlab.com/mmgen/dcgan/dcgan_celeba-cropped_64_b128x1_300kiter_20210408_161607-1f8a2277.json) | -| DCGAN 64x64 | LSUN-Bedroom | 42.79, 34.55, 98.46/58.6 | 0.2095 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/dcgan/dcgan_lsun-bedroom_64x64_b128x1_5e.py) | [model](https://download.openmmlab.com/mmgen/dcgan/dcgan_lsun-bedroom_64_b128x1_5e_20210408_161713-117c498b.pth) | [log](https://download.openmmlab.com/mmgen/dcgan/dcgan_lsun-bedroom_64_b128x1_5e_20210408_161713-117c498b.json) | +| Models | Dataset | SWD | MS-SSIM | Config | Download | +| :---------: | :------------: | :----------------------: | :-----: | :------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| DCGAN 64x64 | MNIST (64x64) | 21.16, 4.4, 8.41/11.32 | 0.1395 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/dcgan/dcgan_mnist-64_b128x1_Glr4e-4_Dlr1e-4_5k.py) | [model](https://download.openmmlab.com//mmgen/dcgan/dcgan_mnist-64_b128x1_Glr4e-4_Dlr1e-4_5k_20210512_163926-207a1eaf.pth) \| [log](https://download.openmmlab.com//mmgen/dcgan/dcgan_mnist-64_b128x1_Glr4e-4_Dlr1e-4_5k_20210512_163926-207a1eaf.json) | +| DCGAN 64x64 | CelebA-Cropped | 8.93,10.53,50.32/23.26 | 0.2899 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/dcgan/dcgan_celeba-cropped_64_b128x1_300k.py) | [model](https://download.openmmlab.com/mmgen/dcgan/dcgan_celeba-cropped_64_b128x1_300kiter_20210408_161607-1f8a2277.pth) \| [log](https://download.openmmlab.com/mmgen/dcgan/dcgan_celeba-cropped_64_b128x1_300kiter_20210408_161607-1f8a2277.json) | +| DCGAN 64x64 | LSUN-Bedroom | 42.79, 34.55, 98.46/58.6 | 0.2095 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/dcgan/dcgan_lsun-bedroom_64x64_b128x1_5e.py) | [model](https://download.openmmlab.com/mmgen/dcgan/dcgan_lsun-bedroom_64_b128x1_5e_20210408_161713-117c498b.pth) \| [log](https://download.openmmlab.com/mmgen/dcgan/dcgan_lsun-bedroom_64_b128x1_5e_20210408_161713-117c498b.json) | ## Citation diff --git a/configs/ggan/README.md b/configs/ggan/README.md index 19f39becc..b94abd772 100644 --- a/configs/ggan/README.md +++ b/configs/ggan/README.md @@ -11,6 +11,7 @@ Generative Adversarial Nets (GANs) represent an important milestone for effective generative models, which has inspired numerous variants seemingly different from each other. One of the main contributions of this paper is to reveal a unified geometric structure in GAN and its variants. Specifically, we show that the adversarial generative model training can be decomposed into three geometric steps: separating hyperplane search, discriminator parameter update away from the separating hyperplane, and the generator update along the normal vector direction of the separating hyperplane. This geometric intuition reveals the limitations of the existing approaches and leads us to propose a new formulation called geometric GAN using SVM separating hyperplane that maximizes the margin. Our theoretical analysis shows that the geometric GAN converges to a Nash equilibrium between the discriminator and generator. In addition, extensive numerical results show that the superior performance of geometric GAN. +
@@ -23,16 +24,17 @@ Generative Adversarial Nets (GANs) represent an important milestone for effectiv
-| Models | Dataset | SWD | MS-SSIM | FID | Config | Download | -|:------------:|:--------------:|:-------------------------------:|:-------:|:-------:|:--------------------------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| -| GGAN 64x64 | CelebA-Cropped | 11.18, 12.21, 39.16/20.85 | 0.3318 | 20.1797 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/ggan/ggan_celeba-cropped_dcgan-archi_lr-1e-3_64_b128x1_12m.py) | [model](https://download.openmmlab.com/mmgen/ggan/ggan_celeba-cropped_dcgan-archi_lr-1e-3_64_b128x1_12m.pth) | [log](https://download.openmmlab.com/mmgen/ggan/ggan_celeba-cropped_dcgan-archi_lr-1e-3_64_b128x1_12m_20210430_113839.log.json) | -| GGAN 128x128 | CelebA-Cropped | 9.81, 11.29, 19.22, 47.79/22.03 | 0.3149 | 18.7647 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/ggan/ggan_celeba-cropped_dcgan-archi_lr-1e-4_128_b64x1_10m.py) | [model](https://download.openmmlab.com/mmgen/ggan/ggan_celeba-cropped_dcgan-archi_lr-1e-4_128_b64x1_10m_20210430_143027-516423dc.pth) | [log](https://download.openmmlab.com/mmgen/ggan/ggan_celeba-cropped_dcgan-archi_lr-1e-4_128_b64x1_10m_20210423_154258.log.json) | -| GGAN 64x64 | LSUN-Bedroom | 9.1, 6.2, 12.27/9.19 | 0.0649 | 85.6629 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/ggan/ggan_lsun-bedroom_lsgan_archi_lr-1e-4_64_b128x1_20m.py) | [model](https://download.openmmlab.com/mmgen/ggan/ggan_lsun-bedroom_lsgan_archi_lr-1e-4_64_b128x1_20m_20210430_143114-5d99b76c.pth) | [log](https://download.openmmlab.com/mmgen/ggan/ggan_lsun-bedroom_lsgan_archi_lr-1e-4_64_b128x1_20m_20210428_202027.log.json) | +| Models | Dataset | SWD | MS-SSIM | FID | Config | Download | +| :----------: | :------------: | :-----------------------------: | :-----: | :-----: | :------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| GGAN 64x64 | CelebA-Cropped | 11.18, 12.21, 39.16/20.85 | 0.3318 | 20.1797 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/ggan/ggan_celeba-cropped_dcgan-archi_lr-1e-3_64_b128x1_12m.py) | [model](https://download.openmmlab.com/mmgen/ggan/ggan_celeba-cropped_dcgan-archi_lr-1e-3_64_b128x1_12m.pth) \| [log](https://download.openmmlab.com/mmgen/ggan/ggan_celeba-cropped_dcgan-archi_lr-1e-3_64_b128x1_12m_20210430_113839.log.json) | +| GGAN 128x128 | CelebA-Cropped | 9.81, 11.29, 19.22, 47.79/22.03 | 0.3149 | 18.7647 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/ggan/ggan_celeba-cropped_dcgan-archi_lr-1e-4_128_b64x1_10m.py) | [model](https://download.openmmlab.com/mmgen/ggan/ggan_celeba-cropped_dcgan-archi_lr-1e-4_128_b64x1_10m_20210430_143027-516423dc.pth) \| [log](https://download.openmmlab.com/mmgen/ggan/ggan_celeba-cropped_dcgan-archi_lr-1e-4_128_b64x1_10m_20210423_154258.log.json) | +| GGAN 64x64 | LSUN-Bedroom | 9.1, 6.2, 12.27/9.19 | 0.0649 | 85.6629 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/ggan/ggan_lsun-bedroom_lsgan_archi_lr-1e-4_64_b128x1_20m.py) | [model](https://download.openmmlab.com/mmgen/ggan/ggan_lsun-bedroom_lsgan_archi_lr-1e-4_64_b128x1_20m_20210430_143114-5d99b76c.pth) \| [log](https://download.openmmlab.com/mmgen/ggan/ggan_lsun-bedroom_lsgan_archi_lr-1e-4_64_b128x1_20m_20210428_202027.log.json) | Note: In the original implementation of [GGAN](https://github.com/lim0606/pytorch-geometric-gan), they set `G_iters` to 10. However our framework does not support `G_iters` currently, so we dropped the settings in the original implementation and conducted several experiments with our own settings. We have shown above the experiment results with the lowest `fid` score. \ Original settings and our settings: + | Models | Dataset | Architecture | optimizer | lr_G | lr_D | G_iters | D_iters | -|:------------------:|:--------------:|:------------:|:---------:|:------:|:------:|:-------:|:-------:| +| :----------------: | :------------: | :----------: | :-------: | :----: | :----: | :-----: | :-----: | | GGAN(origin) 64x64 | CelebA-Cropped | dcgan-archi | RMSprop | 0.0002 | 0.0002 | 10 | 1 | | GGAN(ours) 64x64 | CelebA-Cropped | dcgan-archi | Adam | 0.001 | 0.001 | 1 | 1 | | GGAN(origin) 64x64 | LSUN-Bedroom | dcgan-archi | RMSprop | 0.0002 | 0.0002 | 10 | 1 | diff --git a/configs/improved_ddpm/README.md b/configs/improved_ddpm/README.md index 49a6cfc75..11f4afecf 100644 --- a/configs/improved_ddpm/README.md +++ b/configs/improved_ddpm/README.md @@ -11,6 +11,7 @@ Denoising diffusion probabilistic models (DDPM) are a class of generative models which have recently been shown to produce excellent samples. We show that with a few simple modifications, DDPMs can also achieve competitive log-likelihoods while maintaining high sample quality. Additionally, we find that learning variances of the reverse diffusion process allows sampling with an order of magnitude fewer forward passes with a negligible difference in sample quality, which is important for the practical deployment of these models. We additionally use precision and recall to compare how well DDPMs and GANs cover the target distribution. Finally, we show that the sample quality and likelihood of these models scale smoothly with model capacity and training compute, making them easily scalable. We release our code at this https URL. +
@@ -24,17 +25,16 @@ Denoising diffusion probabilistic models (DDPM) are a class of generative models
- -| Models | Dataset | FID | Config | Download | -|:------------------------------:|:----------:|:-------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| -| Improve-DDPM 32x32 Dropout=0.3 | CIFAR10 | 3.8848 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/improve_ddpm/ddpm_cosine_hybird_timestep-4k_drop0.3_cifar10_32x32_b8x16_500k.py) | [model](https://download.openmmlab.com/mmgen/improved_ddpm/ddpm_cosine_hybird_timestep-4k_drop0.3_cifar10_32x32_b8x16_500k_20220103_222621-2f42f476.pth)| [log](https://download.openmmlab.com/mmgen/improved_ddpm/ddpm_cosine_hybird_timestep-4k_drop0.3_cifar10_32x32_b8x16_500k_20220103_222621-2f42f476.json) | -| Improve-DDPM 64x64 | ImageNet1k | 13.5181 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/improve_ddpm/ddpm_cosine_hybird_timestep-4k_imagenet1k_64x64_b8x16_1500k) | [model](https://download.openmmlab.com/mmgen/improved_ddpm/ddpm_cosine_hybird_timestep-4k_imagenet1k_64x64_b8x16_1500k_20220103_223919-b8f1a310.pth)| [log](https://download.openmmlab.com/mmgen/improved_ddpm/ddpm_cosine_hybird_timestep-4k_imagenet1k_64x64_b8x16_1500k_20220103_223919-b8f1a310.json) | -| Improve-DDPM 64x64 Dropout=0.3 | ImageNet1k | 13.4094 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/improve_ddpm/ddpm_cosine_hybird_timestep-4k_drop0.3_imagenet1k_64x64_b8x16_1500k.py) | [model](https://download.openmmlab.com/mmgen/improved_ddpm/ddpm_cosine_hybird_timestep-4k_drop0.3_imagenet1k_64x64_b8x16_1500k_20220103_224427-7bb55975.pth)| [log](https://download.openmmlab.com/mmgen/improved_ddpm/ddpm_cosine_hybird_timestep-4k_drop0.3_imagenet1k_64x64_b8x16_1500k_20220103_224427-7bb55975.json) | +| Models | Dataset | FID | Config | Download | +| :----------------------------: | :--------: | :-----: | :----------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| Improve-DDPM 32x32 Dropout=0.3 | CIFAR10 | 3.8848 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/improve_ddpm/ddpm_cosine_hybird_timestep-4k_drop0.3_cifar10_32x32_b8x16_500k.py) | [model](https://download.openmmlab.com/mmgen/improved_ddpm/ddpm_cosine_hybird_timestep-4k_drop0.3_cifar10_32x32_b8x16_500k_20220103_222621-2f42f476.pth)\| [log](https://download.openmmlab.com/mmgen/improved_ddpm/ddpm_cosine_hybird_timestep-4k_drop0.3_cifar10_32x32_b8x16_500k_20220103_222621-2f42f476.json) | +| Improve-DDPM 64x64 | ImageNet1k | 13.5181 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/improve_ddpm/ddpm_cosine_hybird_timestep-4k_imagenet1k_64x64_b8x16_1500k) | [model](https://download.openmmlab.com/mmgen/improved_ddpm/ddpm_cosine_hybird_timestep-4k_imagenet1k_64x64_b8x16_1500k_20220103_223919-b8f1a310.pth)\| [log](https://download.openmmlab.com/mmgen/improved_ddpm/ddpm_cosine_hybird_timestep-4k_imagenet1k_64x64_b8x16_1500k_20220103_223919-b8f1a310.json) | +| Improve-DDPM 64x64 Dropout=0.3 | ImageNet1k | 13.4094 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/improve_ddpm/ddpm_cosine_hybird_timestep-4k_drop0.3_imagenet1k_64x64_b8x16_1500k.py) | [model](https://download.openmmlab.com/mmgen/improved_ddpm/ddpm_cosine_hybird_timestep-4k_drop0.3_imagenet1k_64x64_b8x16_1500k_20220103_224427-7bb55975.pth)\| [log](https://download.openmmlab.com/mmgen/improved_ddpm/ddpm_cosine_hybird_timestep-4k_drop0.3_imagenet1k_64x64_b8x16_1500k_20220103_224427-7bb55975.json) | `FID` comparison with official: | Dataset | CIFAR10 | ImageNet1k-64x64 | -|:--------:|:--------:|:----------------:| +| :------: | :------: | :--------------: | | Ours | 3.8848 | **13.5181** | | Official | **3.19** | 19.2 | diff --git a/configs/lsgan/README.md b/configs/lsgan/README.md index 3e1a79194..9ecbd6fee 100644 --- a/configs/lsgan/README.md +++ b/configs/lsgan/README.md @@ -11,6 +11,7 @@ Unsupervised learning with generative adversarial networks (GANs) has proven hugely successful. Regular GANs hypothesize the discriminator as a classifier with the sigmoid cross entropy loss function. However, we found that this loss function may lead to the vanishing gradients problem during the learning process. To overcome such a problem, we propose in this paper the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss function for the discriminator. We show that minimizing the objective function of LSGAN yields minimizing the Pearson χ2 divergence. There are two benefits of LSGANs over regular GANs. First, LSGANs are able to generate higher quality images than regular GANs. Second, LSGANs perform more stable during the learning process. We evaluate LSGANs on five scene datasets and the experimental results show that the images generated by LSGANs are of better quality than the ones generated by regular GANs. We also conduct two comparison experiments between LSGANs and regular GANs to illustrate the stability of LSGANs. +
@@ -23,13 +24,12 @@ Unsupervised learning with generative adversarial networks (GANs) has proven hug - -| Models | Dataset | SWD | MS-SSIM | FID | Config | Download | -|:-------------:|:--------------:|:-------------------------------:|:-------:|:-------:|:----------------------------------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| -| LSGAN 64x64 | CelebA-Cropped | 6.16, 6.83, 37.64/16.87 | 0.3216 | 11.9258 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_dcgan-archi_lr-1e-3_celeba-cropped_64_b128x1_12m.py) | [model](https://download.openmmlab.com/mmgen/lsgan/lsgan_celeba-cropped_dcgan-archi_lr-1e-3_64_b128x1_12m_20210429_144001-92ca1d0d.pth)| [log](https://download.openmmlab.com/mmgen/lsgan/lsgan_celeba-cropped_dcgan-archi_lr-1e-3_64_b128x1_12m_20210422_131925.log.json) | -| LSGAN 64x64 | LSUN-Bedroom | 5.66, 9.0, 18.6/11.09 | 0.0671 | 30.7390 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_dcgan-archi_lr-1e-4_lsun-bedroom_64_b128x1_12m.py) | [model](https://download.openmmlab.com/mmgen/lsgan/lsgan_lsun-bedroom_dcgan-archi_lr-1e-4_64_b128x1_12m_20210429_144602-ec4ec6bb.pth)| [log](https://download.openmmlab.com/mmgen/lsgan/lsgan_lsun-bedroom_dcgan-archi_lr-1e-4_64_b128x1_12m_20210423_005020.log.json) | -| LSGAN 128x128 | CelebA-Cropped | 21.66, 9.83, 16.06, 70.76/29.58 | 0.3691 | 38.3752 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_dcgan-archi_lr-1e-4_celeba-cropped_128_b64x1_10m.py) | [model](https://download.openmmlab.com/mmgen/lsgan/lsgan_celeba-cropped_dcgan-archi_lr-1e-4_128_b64x1_10m_20210429_144229-01ba67dc.pth)| [log](https://download.openmmlab.com/mmgen/lsgan/lsgan_celeba-cropped_dcgan-archi_lr-1e-4_128_b64x1_10m_20210423_132126.log.json) | -| LSGAN 128x128 | LSUN-Bedroom | 19.52, 9.99, 7.48, 14.3/12.82 | 0.0612 | 51.5500 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_lsgan-archi_lr-1e-4_lsun-bedroom_128_b64x1_10m.py) | [model](https://download.openmmlab.com/mmgen/lsgan/lsgan_lsun-bedroom_lsgan-archi_lr-1e-4_128_b64x1_10m_20210429_155605-cf78c0a8.pth)| [log](https://download.openmmlab.com/mmgen/lsgan/lsgan_lsun-bedroom_lsgan-archi_lr-1e-4_128_b64x1_10m_20210429_142302.log.json) | +| Models | Dataset | SWD | MS-SSIM | FID | Config | Download | +| :-----------: | :------------: | :-----------------------------: | :-----: | :-----: | :--------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| LSGAN 64x64 | CelebA-Cropped | 6.16, 6.83, 37.64/16.87 | 0.3216 | 11.9258 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_dcgan-archi_lr-1e-3_celeba-cropped_64_b128x1_12m.py) | [model](https://download.openmmlab.com/mmgen/lsgan/lsgan_celeba-cropped_dcgan-archi_lr-1e-3_64_b128x1_12m_20210429_144001-92ca1d0d.pth)\| [log](https://download.openmmlab.com/mmgen/lsgan/lsgan_celeba-cropped_dcgan-archi_lr-1e-3_64_b128x1_12m_20210422_131925.log.json) | +| LSGAN 64x64 | LSUN-Bedroom | 5.66, 9.0, 18.6/11.09 | 0.0671 | 30.7390 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_dcgan-archi_lr-1e-4_lsun-bedroom_64_b128x1_12m.py) | [model](https://download.openmmlab.com/mmgen/lsgan/lsgan_lsun-bedroom_dcgan-archi_lr-1e-4_64_b128x1_12m_20210429_144602-ec4ec6bb.pth)\| [log](https://download.openmmlab.com/mmgen/lsgan/lsgan_lsun-bedroom_dcgan-archi_lr-1e-4_64_b128x1_12m_20210423_005020.log.json) | +| LSGAN 128x128 | CelebA-Cropped | 21.66, 9.83, 16.06, 70.76/29.58 | 0.3691 | 38.3752 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_dcgan-archi_lr-1e-4_celeba-cropped_128_b64x1_10m.py) | [model](https://download.openmmlab.com/mmgen/lsgan/lsgan_celeba-cropped_dcgan-archi_lr-1e-4_128_b64x1_10m_20210429_144229-01ba67dc.pth)\| [log](https://download.openmmlab.com/mmgen/lsgan/lsgan_celeba-cropped_dcgan-archi_lr-1e-4_128_b64x1_10m_20210423_132126.log.json) | +| LSGAN 128x128 | LSUN-Bedroom | 19.52, 9.99, 7.48, 14.3/12.82 | 0.0612 | 51.5500 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_lsgan-archi_lr-1e-4_lsun-bedroom_128_b64x1_10m.py) | [model](https://download.openmmlab.com/mmgen/lsgan/lsgan_lsun-bedroom_lsgan-archi_lr-1e-4_128_b64x1_10m_20210429_155605-cf78c0a8.pth)\| [log](https://download.openmmlab.com/mmgen/lsgan/lsgan_lsun-bedroom_lsgan-archi_lr-1e-4_128_b64x1_10m_20210429_142302.log.json) | ## Citation diff --git a/configs/pggan/README.md b/configs/pggan/README.md index e18734a5d..dbfad871b 100644 --- a/configs/pggan/README.md +++ b/configs/pggan/README.md @@ -11,6 +11,7 @@ We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CelebA images at 1024^2. We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher-quality version of the CelebA dataset. +
diff --git a/configs/pix2pix/README.md b/configs/pix2pix/README.md index 92f09702b..427f9b639 100644 --- a/configs/pix2pix/README.md +++ b/configs/pix2pix/README.md @@ -11,11 +11,13 @@ We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either. +
## Results and Models +
Results from Pix2Pix trained by MMGeneration
@@ -23,34 +25,34 @@ We investigate conditional adversarial networks as a general-purpose solution to
We use `FID` and `IS` metrics to evaluate the generation performance of pix2pix.1 -| Models | Dataset | FID | IS | Config | Download | -| :----: | :---------: | :------: | :---: | :---------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| Ours | facades | 124.9773 | 1.620 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_facades_b1x1_80k.py) | [model](https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_1x1_80k_facades_20210902_170442-c0958d50.pth) \| [log](https://download.openmmlab.com/mmgen/pix2pix/pix2pix_vanilla_unet_bn_1x1_80k_facades_20210317_172625.log.json)2 -| Ours | aerial2maps | 122.5856 | 3.137 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_aerial2maps_b1x1_220k.py) | [model](https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_a2b_1x1_219200_maps_convert-bgr_20210902_170729-59a31517.pth) | -| Ours | maps2aerial | 88.4635 | 3.310 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_maps2aerial_b1x1_220k.py) | [model](https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_b2a_1x1_219200_maps_convert-bgr_20210902_170814-6d2eac4a.pth) | -| Ours | edges2shoes | 84.3750 | 2.815 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_wo_jitter_flip_edges2shoes_b1x4_190k.py) | [model](https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_wo_jitter_flip_1x4_186840_edges2shoes_convert-bgr_20210902_170902-0c828552.pth) | - +| Models | Dataset | FID | IS | Config | Download | +| :----: | :---------: | :------: | :---: | :----------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| Ours | facades | 124.9773 | 1.620 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_facades_b1x1_80k.py) | [model](https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_1x1_80k_facades_20210902_170442-c0958d50.pth) \| [log](https://download.openmmlab.com/mmgen/pix2pix/pix2pix_vanilla_unet_bn_1x1_80k_facades_20210317_172625.log.json)2 | +| Ours | aerial2maps | 122.5856 | 3.137 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_aerial2maps_b1x1_220k.py) | [model](https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_a2b_1x1_219200_maps_convert-bgr_20210902_170729-59a31517.pth) | +| Ours | maps2aerial | 88.4635 | 3.310 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_maps2aerial_b1x1_220k.py) | [model](https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_b2a_1x1_219200_maps_convert-bgr_20210902_170814-6d2eac4a.pth) | +| Ours | edges2shoes | 84.3750 | 2.815 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_wo_jitter_flip_edges2shoes_b1x4_190k.py) | [model](https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_wo_jitter_flip_1x4_186840_edges2shoes_convert-bgr_20210902_170902-0c828552.pth) | `FID` comparison with official: | Dataset | facades | aerial2maps | maps2aerial | edges2shoes | average | -|:--------:|:-----------:|:------------:|:-----------:|:-----------:|:------------:| +| :------: | :---------: | :----------: | :---------: | :---------: | :----------: | | official | **119.135** | 149.731 | 102.072 | **75.774** | 111.678 | | ours | 124.9773 | **122.5856** | **88.4635** | 84.3750 | **105.1003** | `IS` comparison with official: | Dataset | facades | aerial2maps | maps2aerial | edges2shoes | average | -|:--------:|:---------:|:-----------:|:-----------:|:-----------:|:----------:| +| :------: | :-------: | :---------: | :---------: | :---------: | :--------: | | official | **1.650** | 2.529 | **3.552** | 2.766 | 2.624 | | ours | 1.620 | **3.137** | 3.310 | **2.815** | **2.7205** | Note: + 1. we strictly follow the [paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Isola_Image-To-Image_Translation_With_CVPR_2017_paper.pdf) setting in Section 3.3: "*At inference time, we run the generator net in exactly -the same manner as during the training phase. This differs -from the usual protocol in that we apply dropout at test time, -and we apply batch normalization using the statistics of -the test batch, rather than aggregated statistics of the training batch.*" (i.e., use model.train() mode), thus may lead to slightly different inference results every time. + the same manner as during the training phase. This differs + from the usual protocol in that we apply dropout at test time, + and we apply batch normalization using the statistics of + the test batch, rather than aggregated statistics of the training batch.*" (i.e., use model.train() mode), thus may lead to slightly different inference results every time. 2. This is the training log before refactoring. Updated logs will be released soon. ## Citation diff --git a/configs/positional_encoding_in_gans/README.md b/configs/positional_encoding_in_gans/README.md index 872c213ae..3c116a6f9 100644 --- a/configs/positional_encoding_in_gans/README.md +++ b/configs/positional_encoding_in_gans/README.md @@ -11,6 +11,7 @@ SinGAN shows impressive capability in learning internal patch distribution despite its limited effective receptive field. We are interested in knowing how such a translation-invariant convolutional generator could capture the global structure with just a spatially i.i.d. input. In this work, taking SinGAN and StyleGAN2 as examples, we show that such capability, to a large extent, is brought by the implicit positional encoding when using zero padding in the generators. Such positional encoding is indispensable for generating images with high fidelity. The same phenomenon is observed in other generative architectures such as DCGAN and PGGAN. We further show that zero padding leads to an unbalanced spatial bias with a vague relation between locations. To offer a better spatial inductive bias, we investigate alternative positional encodings and analyze their effects. Based on a more flexible positional encoding explicitly, we propose a new multi-scale training strategy and demonstrate its effectiveness in the state-of-the-art unconditional generator StyleGAN2. Besides, the explicit spatial inductive bias substantially improve SinGAN for more versatile image manipulation. +
@@ -23,7 +24,6 @@ SinGAN shows impressive capability in learning internal patch distribution despi - | Models | Reference in Paper | Scales | FID50k | P&R10k | Config | Download | | :--------------------------: | :----------------: | :------------: | :----: | :---------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------: | | stylegan2_c2_256_baseline | Tab.5 config-a | 256 | 5.56 | 75.92/51.24 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/stylegan2_c2_ffhq_256_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/stylegan2_c2_config-a_ffhq_256x256_b3x8_1100k_20210406_145127-71d9634b.pth) | @@ -50,17 +50,16 @@ Note that we report the FID and P&R metric (FFHQ dataset) in the largest scale. - -| Model | Data | Num Scales | Config | Download | -| :-----------------------------: | :------------------------------------------------------------------------------: | :--------: | :------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| SinGAN + no pad | [balloons.png](https://download.openmmlab.com/mmgen/dataset/singan/balloons.png) | 8 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_interp-pad_balloons.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_balloons_20210406_180014-96f51555.pth) | [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_balloons_20210406_180014-96f51555.pkl) | -| SinGAN + no pad + no bn in disc | [balloons.png](https://download.openmmlab.com/mmgen/dataset/singan/balloons.png) | 8 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_interp-pad_disc-nobn_balloons.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_disc-nobn_balloons_20210406_180059-7d63e65d.pth) | [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_disc-nobn_balloons_20210406_180059-7d63e65d.pkl) | -| SinGAN + no pad + no bn in disc | [fish.jpg](https://download.openmmlab.com/mmgen/dataset/singan/fish-crop.jpg) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_interp-pad_disc-nobn_fish.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_disc-nobn_fis_20210406_175720-9428517a.pth) | [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_disc-nobn_fis_20210406_175720-9428517a.pkl) | -| SinGAN + CSG | [fish.jpg](https://download.openmmlab.com/mmgen/dataset/singan/fish-crop.jpg) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_csg_fish.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_csg_fis_20210406_175532-f0ec7b61.pth) | [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_csg_fis_20210406_175532-f0ec7b61.pkl) | -| SinGAN + CSG | [bohemian.png](https://download.openmmlab.com/mmgen/dataset/singan/bohemian.png) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_csg_bohemian.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_csg_bohemian_20210407_195455-5ed56db2.pth) | [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_csg_bohemian_20210407_195455-5ed56db2.pkl) | -| SinGAN + SPE-dim4 | [fish.jpg](https://download.openmmlab.com/mmgen/dataset/singan/fish-crop.jpg) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_spe-dim4_fish.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim4_fish_20210406_175933-f483a7e3.pth) | [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim4_fish_20210406_175933-f483a7e3.pkl) | -| SinGAN + SPE-dim4 | [bohemian.png](https://download.openmmlab.com/mmgen/dataset/singan/bohemian.png) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_spe-dim4_bohemian.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim4_bohemian_20210406_175820-6e484a35.pth) | [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim4_bohemian_20210406_175820-6e484a35.pkl) | -| SinGAN + SPE-dim8 | [bohemian.png](https://download.openmmlab.com/mmgen/dataset/singan/bohemian.png) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_spe-dim8_bohemian.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim8_bohemian_20210406_175858-7faa50f3.pth) | [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim8_bohemian_20210406_175858-7faa50f3.pkl) | +| Model | Data | Num Scales | Config | Download | +| :-----------------------------: | :------------------------------------------------------------------------------: | :--------: | :------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| SinGAN + no pad | [balloons.png](https://download.openmmlab.com/mmgen/dataset/singan/balloons.png) | 8 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_interp-pad_balloons.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_balloons_20210406_180014-96f51555.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_balloons_20210406_180014-96f51555.pkl) | +| SinGAN + no pad + no bn in disc | [balloons.png](https://download.openmmlab.com/mmgen/dataset/singan/balloons.png) | 8 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_interp-pad_disc-nobn_balloons.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_disc-nobn_balloons_20210406_180059-7d63e65d.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_disc-nobn_balloons_20210406_180059-7d63e65d.pkl) | +| SinGAN + no pad + no bn in disc | [fish.jpg](https://download.openmmlab.com/mmgen/dataset/singan/fish-crop.jpg) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_interp-pad_disc-nobn_fish.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_disc-nobn_fis_20210406_175720-9428517a.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_disc-nobn_fis_20210406_175720-9428517a.pkl) | +| SinGAN + CSG | [fish.jpg](https://download.openmmlab.com/mmgen/dataset/singan/fish-crop.jpg) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_csg_fish.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_csg_fis_20210406_175532-f0ec7b61.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_csg_fis_20210406_175532-f0ec7b61.pkl) | +| SinGAN + CSG | [bohemian.png](https://download.openmmlab.com/mmgen/dataset/singan/bohemian.png) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_csg_bohemian.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_csg_bohemian_20210407_195455-5ed56db2.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_csg_bohemian_20210407_195455-5ed56db2.pkl) | +| SinGAN + SPE-dim4 | [fish.jpg](https://download.openmmlab.com/mmgen/dataset/singan/fish-crop.jpg) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_spe-dim4_fish.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim4_fish_20210406_175933-f483a7e3.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim4_fish_20210406_175933-f483a7e3.pkl) | +| SinGAN + SPE-dim4 | [bohemian.png](https://download.openmmlab.com/mmgen/dataset/singan/bohemian.png) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_spe-dim4_bohemian.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim4_bohemian_20210406_175820-6e484a35.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim4_bohemian_20210406_175820-6e484a35.pkl) | +| SinGAN + SPE-dim8 | [bohemian.png](https://download.openmmlab.com/mmgen/dataset/singan/bohemian.png) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_spe-dim8_bohemian.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim8_bohemian_20210406_175858-7faa50f3.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim8_bohemian_20210406_175858-7faa50f3.pkl) | ## Citation diff --git a/configs/sagan/README.md b/configs/sagan/README.md index b6aaec34b..233c2addd 100644 --- a/configs/sagan/README.md +++ b/configs/sagan/README.md @@ -11,6 +11,7 @@ In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. Traditional convolutional GANs generate high-resolution details as a function of only spatially local points in lower-resolution feature maps. In SAGAN, details can be generated using cues from all feature locations. Moreover, the discriminator can check that highly detailed features in distant portions of the image are consistent with each other. Furthermore, recent work has shown that generator conditioning affects GAN performance. Leveraging this insight, we apply spectral normalization to the GAN generator and find that this improves training dynamics. The proposed SAGAN performs better than prior work, boosting the best published Inception score from 36.8 to 52.52 and reducing Fréchet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset. Visualization of the attention layers shows that the generator leverages neighborhoods that correspond to object shapes rather than local regions of fixed shape. +
@@ -23,19 +24,19 @@ In this paper, we propose the Self-Attention Generative Adversarial Network (SAG - -| Models | Dataset | Inplace ReLU | dist_step | Total Batchsize (BZ_PER_GPU \* NGPU) | Total Iters* | Iter | IS | FID | Config | Download | Log | -|:--------------------------------------:|:--------:|:------------:|:---------:|:------------------------------------:|:------------:|:------:|:-------:|:-------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------:| -| SAGAN-32x32-woInplaceReLU Best IS | CIFAR10 | w/o | 5 | 64x1 | 500000 | 400000 | 9.3217 | 10.5030 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/sagan/sagan_32_woReLUinplace_lr-2e-4_ndisc5_cifar10_b64x1.py) | [model](https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_woReUinplace_is-iter400000_20210730_125743-4008a9ca.pth) | [Log](https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_woReUinplace_20210730_125449_fid-d50568a4_is-04008a9ca.json) | -| SAGAN-32x32-woInplaceReLU Best FID | CIFAR10 | w/o | 5 | 64x1 | 500000 | 480000 | 9.3174 | 9.4252 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/sagan/sagan_32_woReLUinplace_lr-2e-4_ndisc5_cifar10_b64x1.py) | [model](https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_woReUinplace_fid-iter480000_20210730_125449-d50568a4.pth) | [Log](https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_woReUinplace_20210730_125449_fid-d50568a4_is-04008a9ca.json) | -| SAGAN-32x32-wInplaceReLU Best IS | CIFAR10 | w | 5 | 64x1 | 500000 | 380000 | 9.2286 | 11.7760 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/sagan/sagan_32_wReLUinplace_lr-2e-4_ndisc5_cifar10_b64x1.py) | [model](https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_wReLUinplace_is-iter380000_20210730_124937-c77b4d25.pth) | [Log](https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_wReLUinplace_20210730_125155_fid-cbefb354_is-c77b4d25.json) | -| SAGAN-32x32-wInplaceReLU Best FID | CIFAR10 | w | 5 | 64x1 | 500000 | 460000 | 9.2061 | 10.7781 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/sagan/sagan_32_wReLUinplace_lr-2e-4_ndisc5_cifar10_b64x1.py) | [model](https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_wReLUinplace_fid-iter460000_20210730_125155-cbefb354.pth) | [Log](https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_wReLUinplace_20210730_125155_fid-cbefb354_is-c77b4d25.json) | -| SAGAN-128x128-woInplaceReLU Best IS | ImageNet | w/o | 1 | 64x4 | 1000000 | 980000 | 31.5938 | 36.7712 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/sagan/sagan_128_woReLUinplace_Glr-1e-4_Dlr-4e-4_ndisc1_imagenet1k_b64x4.py) | [model](https://download.openmmlab.com/mmgen/sagan/sagan_imagenet1k_128_Glr1e-4_Dlr4e-4_ndisc1_b32x4_woReLUinplace_is-iter980000_20210730_163140-cfbebfc6.pth) | [Log](https://download.openmmlab.com/mmgen/sagan/sagan_imagenet1k_128_Glr1e-4_Dlr4e-4_ndisc1_b32x4_woReLUinplace_20210730_163431_fid-d7916963_is-cfbebfc6.json) | -| SAGAN-128x128-woInplaceReLU Best FID | ImageNet | w/o | 1 | 64x4 | 1000000 | 950000 | 28.4936 | 34.7838 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/sagan/sagan_128_woReLUinplace_Glr-1e-4_Dlr-4e-4_ndisc1_imagenet1k_b64x4.py) | [model](https://download.openmmlab.com/mmgen/sagan/sagan_imagenet1k_128_Glr1e-4_Dlr4e-4_ndisc1_b32x4_woReLUinplace_fid-iter950000_20210730_163431-d7916963.pth) | [Log](https://download.openmmlab.com/mmgen/sagan/sagan_imagenet1k_128_Glr1e-4_Dlr4e-4_ndisc1_b32x4_woReLUinplace_20210730_163431_fid-d7916963_is-cfbebfc6.json) | -| SAGAN-128x128-BigGAN Schedule Best IS | ImageNet | w/o | 1 | 32x8 | 1000000 | 826000 | 69.5350 | 12.8295 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/sagan/sagan_128_woReLUinplace_noaug_bigGAN_Glr-1e-4_Dlr-4e-4_ndisc1_imagenet1k_b32x8.py) | [model](https://download.openmmlab.com/mmgen/sagan/sagan_128_woReLUinplace_noaug_bigGAN_imagenet1k_b32x8_Glr1e-4_Dlr-4e-4_ndisc1_20210818_210232-3f5686af.pth) | [Log](https://download.openmmlab.com/mmgen/sagan/sagan_128_woReLUinplace_noaug_bigGAN_imagenet1k_b32x8_Glr1e-4_Dlr-4e-4_ndisc1_20210818_210232-3f5686af.json) | -| SAGAN-128x128-BigGAN Schedule Best FID | ImageNet | w/o | 1 | 32x8 | 1000000 | 826000 | 69.5350 | 12.8295 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/sagan/sagan_128_woReLUinplace_noaug_bigGAN_Glr-1e-4_Dlr-4e-4_ndisc1_imagenet1k_b32x8.py) | [model](https://download.openmmlab.com/mmgen/sagan/sagan_128_woReLUinplace_noaug_bigGAN_imagenet1k_b32x8_Glr1e-4_Dlr-4e-4_ndisc1_20210818_210232-3f5686af.pth) | [Log](https://download.openmmlab.com/mmgen/sagan/sagan_128_woReLUinplace_noaug_bigGAN_imagenet1k_b32x8_Glr1e-4_Dlr-4e-4_ndisc1_20210818_210232-3f5686af.json) | +| Models | Dataset | Inplace ReLU | dist_step | Total Batchsize (BZ_PER_GPU * NGPU) | Total Iters\* | Iter | IS | FID | Config | Download | Log | +| :------------------------------------: | :------: | :----------: | :-------: | :---------------------------------: | :-----------: | :----: | :-----: | :-----: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| SAGAN-32x32-woInplaceReLU Best IS | CIFAR10 | w/o | 5 | 64x1 | 500000 | 400000 | 9.3217 | 10.5030 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/sagan/sagan_32_woReLUinplace_lr-2e-4_ndisc5_cifar10_b64x1.py) | [model](https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_woReUinplace_is-iter400000_20210730_125743-4008a9ca.pth) | [Log](https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_woReUinplace_20210730_125449_fid-d50568a4_is-04008a9ca.json) | +| SAGAN-32x32-woInplaceReLU Best FID | CIFAR10 | w/o | 5 | 64x1 | 500000 | 480000 | 9.3174 | 9.4252 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/sagan/sagan_32_woReLUinplace_lr-2e-4_ndisc5_cifar10_b64x1.py) | [model](https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_woReUinplace_fid-iter480000_20210730_125449-d50568a4.pth) | [Log](https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_woReUinplace_20210730_125449_fid-d50568a4_is-04008a9ca.json) | +| SAGAN-32x32-wInplaceReLU Best IS | CIFAR10 | w | 5 | 64x1 | 500000 | 380000 | 9.2286 | 11.7760 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/sagan/sagan_32_wReLUinplace_lr-2e-4_ndisc5_cifar10_b64x1.py) | [model](https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_wReLUinplace_is-iter380000_20210730_124937-c77b4d25.pth) | [Log](https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_wReLUinplace_20210730_125155_fid-cbefb354_is-c77b4d25.json) | +| SAGAN-32x32-wInplaceReLU Best FID | CIFAR10 | w | 5 | 64x1 | 500000 | 460000 | 9.2061 | 10.7781 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/sagan/sagan_32_wReLUinplace_lr-2e-4_ndisc5_cifar10_b64x1.py) | [model](https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_wReLUinplace_fid-iter460000_20210730_125155-cbefb354.pth) | [Log](https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_wReLUinplace_20210730_125155_fid-cbefb354_is-c77b4d25.json) | +| SAGAN-128x128-woInplaceReLU Best IS | ImageNet | w/o | 1 | 64x4 | 1000000 | 980000 | 31.5938 | 36.7712 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/sagan/sagan_128_woReLUinplace_Glr-1e-4_Dlr-4e-4_ndisc1_imagenet1k_b64x4.py) | [model](https://download.openmmlab.com/mmgen/sagan/sagan_imagenet1k_128_Glr1e-4_Dlr4e-4_ndisc1_b32x4_woReLUinplace_is-iter980000_20210730_163140-cfbebfc6.pth) | [Log](https://download.openmmlab.com/mmgen/sagan/sagan_imagenet1k_128_Glr1e-4_Dlr4e-4_ndisc1_b32x4_woReLUinplace_20210730_163431_fid-d7916963_is-cfbebfc6.json) | +| SAGAN-128x128-woInplaceReLU Best FID | ImageNet | w/o | 1 | 64x4 | 1000000 | 950000 | 28.4936 | 34.7838 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/sagan/sagan_128_woReLUinplace_Glr-1e-4_Dlr-4e-4_ndisc1_imagenet1k_b64x4.py) | [model](https://download.openmmlab.com/mmgen/sagan/sagan_imagenet1k_128_Glr1e-4_Dlr4e-4_ndisc1_b32x4_woReLUinplace_fid-iter950000_20210730_163431-d7916963.pth) | [Log](https://download.openmmlab.com/mmgen/sagan/sagan_imagenet1k_128_Glr1e-4_Dlr4e-4_ndisc1_b32x4_woReLUinplace_20210730_163431_fid-d7916963_is-cfbebfc6.json) | +| SAGAN-128x128-BigGAN Schedule Best IS | ImageNet | w/o | 1 | 32x8 | 1000000 | 826000 | 69.5350 | 12.8295 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/sagan/sagan_128_woReLUinplace_noaug_bigGAN_Glr-1e-4_Dlr-4e-4_ndisc1_imagenet1k_b32x8.py) | [model](https://download.openmmlab.com/mmgen/sagan/sagan_128_woReLUinplace_noaug_bigGAN_imagenet1k_b32x8_Glr1e-4_Dlr-4e-4_ndisc1_20210818_210232-3f5686af.pth) | [Log](https://download.openmmlab.com/mmgen/sagan/sagan_128_woReLUinplace_noaug_bigGAN_imagenet1k_b32x8_Glr1e-4_Dlr-4e-4_ndisc1_20210818_210232-3f5686af.json) | +| SAGAN-128x128-BigGAN Schedule Best FID | ImageNet | w/o | 1 | 32x8 | 1000000 | 826000 | 69.5350 | 12.8295 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/sagan/sagan_128_woReLUinplace_noaug_bigGAN_Glr-1e-4_Dlr-4e-4_ndisc1_imagenet1k_b32x8.py) | [model](https://download.openmmlab.com/mmgen/sagan/sagan_128_woReLUinplace_noaug_bigGAN_imagenet1k_b32x8_Glr1e-4_Dlr-4e-4_ndisc1_20210818_210232-3f5686af.pth) | [Log](https://download.openmmlab.com/mmgen/sagan/sagan_128_woReLUinplace_noaug_bigGAN_imagenet1k_b32x8_Glr1e-4_Dlr-4e-4_ndisc1_20210818_210232-3f5686af.json) | '\*' Iteration counting rule in our implementation is different from others. If you want to align with other codebases, you can use the following conversion formula: + ``` total_iters (biggan/pytorch studio gan) = our_total_iters / dist_step ``` @@ -43,18 +44,16 @@ total_iters (biggan/pytorch studio gan) = our_total_iters / dist_step We also provide converted pre-train models from [Pytorch-StudioGAN](https://github.com/POSTECH-CVLab/PyTorch-StudioGAN). To be noted that, in Pytorch Studio GAN, **inplace ReLU** is used in generator and discriminator. - | Models | Dataset | Inplace ReLU | n_disc | Total Iters | IS (Our Pipeline) | FID (Our Pipeline) | IS (StudioGAN) | FID (StudioGAN) | Config | Download | Original Download link | -|:------------------------:|:--------:|:------------:|:------:|:-----------:|:-----------------:|:------------------:|:--------------:|:---------------:|:-------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------:| +| :----------------------: | :------: | :----------: | :----: | :---------: | :---------------: | :----------------: | :------------: | :-------------: | :-----------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------: | | SAGAN-32x32 StudioGAN | CIFAR10 | w | 5 | 100000 | 9.116 | 10.2011 | 8.680 | 14.009 | [Config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/_base_/models/sagan_32x32.py) | [model](https://download.openmmlab.com/mmgen/sagan/sagan_32_cifar10_convert-studio-rgb_20210730_153321-080da7e2.pth) | [model](https://drive.google.com/drive/folders/1FA8hcz4MB8-hgTwLuDA0ZUfr8slud5P_) | | SAGAN0-128x128 StudioGAN | ImageNet | w | 1 | 1000000 | 27.367 | 40.1162 | 29.848 | 34.726 | [Config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/_base_/models/sagan_128x128.py) | [model](https://download.openmmlab.com/mmgen/sagan/sagan_128_imagenet1k_convert-studio-rgb_20210730_153357-eddb0d1d.pth) | [model](https://drive.google.com/drive/folders/1ZYaqeeumDgxOPDhRR5QLeLFIpgBJ9S6B) | - - -* `Our Pipeline` denote results evaluated with our pipeline. -* `StudioGAN` denote results released by Pytorch-StudioGAN. +- `Our Pipeline` denote results evaluated with our pipeline. +- `StudioGAN` denote results released by Pytorch-StudioGAN. For IS metric, our implementation is different from PyTorch-Studio GAN in the following aspects: + 1. We use [Tero's Inception](https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metrics/inception-2015-12-05.pt) for feature extraction. 2. We use bicubic interpolation with PIL backend to resize image before feed them to Inception. @@ -63,6 +62,7 @@ For FID evaluation, we follow the pipeline of [BigGAN](https://github.com/ajbroc You can download the preprocessed inception state by the following url: [CIFAR10](https://download.openmmlab.com/mmgen/evaluation/fid_inception_pkl/cifar10.pkl) and [ImageNet1k](https://download.openmmlab.com/mmgen/evaluation/fid_inception_pkl/imagenet.pkl). You can use following commands to extract those inception states by yourself. + ``` # For CIFAR10 python tools/utils/inception_stat.py --data-cfg configs/_base_/datasets/cifar10_inception_stat.py --pklname cifar10.pkl --no-shuffle --inception-style stylegan --num-samples -1 --subset train diff --git a/configs/sagan/metafile.yml b/configs/sagan/metafile.yml index 9e0519205..c36875f19 100644 --- a/configs/sagan/metafile.yml +++ b/configs/sagan/metafile.yml @@ -20,8 +20,8 @@ Models: Inplace ReLU: w/o Iter: 400000.0 Log: '[Log]' - Total Batchsize (BZ_PER_GPU \* NGPU): 64x1 - Total Iters*: 500000.0 + Total Batchsize (BZ_PER_GPU * NGPU): 64x1 + Total Iters\*: 500000.0 dist_step: 5.0 Task: Conditional GANs Weights: https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_woReUinplace_is-iter400000_20210730_125743-4008a9ca.pth @@ -38,8 +38,8 @@ Models: Inplace ReLU: w/o Iter: 480000.0 Log: '[Log]' - Total Batchsize (BZ_PER_GPU \* NGPU): 64x1 - Total Iters*: 500000.0 + Total Batchsize (BZ_PER_GPU * NGPU): 64x1 + Total Iters\*: 500000.0 dist_step: 5.0 Task: Conditional GANs Weights: https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_woReUinplace_fid-iter480000_20210730_125449-d50568a4.pth @@ -56,8 +56,8 @@ Models: Inplace ReLU: w Iter: 380000.0 Log: '[Log]' - Total Batchsize (BZ_PER_GPU \* NGPU): 64x1 - Total Iters*: 500000.0 + Total Batchsize (BZ_PER_GPU * NGPU): 64x1 + Total Iters\*: 500000.0 dist_step: 5.0 Task: Conditional GANs Weights: https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_wReLUinplace_is-iter380000_20210730_124937-c77b4d25.pth @@ -74,8 +74,8 @@ Models: Inplace ReLU: w Iter: 460000.0 Log: '[Log]' - Total Batchsize (BZ_PER_GPU \* NGPU): 64x1 - Total Iters*: 500000.0 + Total Batchsize (BZ_PER_GPU * NGPU): 64x1 + Total Iters\*: 500000.0 dist_step: 5.0 Task: Conditional GANs Weights: https://download.openmmlab.com/mmgen/sagan/sagan_cifar10_32_lr2e-4_ndisc5_b64x1_wReLUinplace_fid-iter460000_20210730_125155-cbefb354.pth @@ -92,8 +92,8 @@ Models: Inplace ReLU: w/o Iter: 980000.0 Log: '[Log]' - Total Batchsize (BZ_PER_GPU \* NGPU): 64x4 - Total Iters*: 1000000.0 + Total Batchsize (BZ_PER_GPU * NGPU): 64x4 + Total Iters\*: 1000000.0 dist_step: 1.0 Task: Conditional GANs Weights: https://download.openmmlab.com/mmgen/sagan/sagan_imagenet1k_128_Glr1e-4_Dlr4e-4_ndisc1_b32x4_woReLUinplace_is-iter980000_20210730_163140-cfbebfc6.pth @@ -110,8 +110,8 @@ Models: Inplace ReLU: w/o Iter: 950000.0 Log: '[Log]' - Total Batchsize (BZ_PER_GPU \* NGPU): 64x4 - Total Iters*: 1000000.0 + Total Batchsize (BZ_PER_GPU * NGPU): 64x4 + Total Iters\*: 1000000.0 dist_step: 1.0 Task: Conditional GANs Weights: https://download.openmmlab.com/mmgen/sagan/sagan_imagenet1k_128_Glr1e-4_Dlr4e-4_ndisc1_b32x4_woReLUinplace_fid-iter950000_20210730_163431-d7916963.pth @@ -128,8 +128,8 @@ Models: Inplace ReLU: w/o Iter: 826000.0 Log: '[Log]' - Total Batchsize (BZ_PER_GPU \* NGPU): 32x8 - Total Iters*: 1000000.0 + Total Batchsize (BZ_PER_GPU * NGPU): 32x8 + Total Iters\*: 1000000.0 dist_step: 1.0 Task: Conditional GANs Weights: https://download.openmmlab.com/mmgen/sagan/sagan_128_woReLUinplace_noaug_bigGAN_imagenet1k_b32x8_Glr1e-4_Dlr-4e-4_ndisc1_20210818_210232-3f5686af.pth @@ -146,8 +146,8 @@ Models: Inplace ReLU: w/o Iter: 826000.0 Log: '[Log]' - Total Batchsize (BZ_PER_GPU \* NGPU): 32x8 - Total Iters*: 1000000.0 + Total Batchsize (BZ_PER_GPU * NGPU): 32x8 + Total Iters\*: 1000000.0 dist_step: 1.0 Task: Conditional GANs Weights: https://download.openmmlab.com/mmgen/sagan/sagan_128_woReLUinplace_noaug_bigGAN_imagenet1k_b32x8_Glr1e-4_Dlr-4e-4_ndisc1_20210818_210232-3f5686af.pth diff --git a/configs/singan/README.md b/configs/singan/README.md index ad44f4e34..903dd5366 100644 --- a/configs/singan/README.md +++ b/configs/singan/README.md @@ -11,6 +11,7 @@ We introduce SinGAN, an unconditional generative model that can be learned from a single natural image. Our model is trained to capture the internal distribution of patches within the image, and is then able to generate high quality, diverse samples that carry the same visual content as the image. SinGAN contains a pyramid of fully convolutional GANs, each responsible for learning the patch distribution at a different scale of the image. This allows generating new samples of arbitrary size and aspect ratio, that have significant variability, yet maintain both the global structure and the fine textures of the training image. In contrast to previous single image GAN schemes, our approach is not limited to texture images, and is not conditional (i.e. it generates samples from noise). User studies confirm that the generated samples are commonly confused to be real images. We illustrate the utility of SinGAN in a wide range of image manipulation tasks. +
@@ -23,13 +24,11 @@ We introduce SinGAN, an unconditional generative model that can be learned from - -| Model | Data | Num Scales | Config | Download | -| :----: | :------------------------------------------------------------------------------: | :--------: | :------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| SinGAN | [balloons.png](https://download.openmmlab.com/mmgen/dataset/singan/balloons.png) | 8 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/singan/singan_balloons.py) | [ckpt](https://download.openmmlab.com/mmgen/singan/singan_balloons_20210406_191047-8fcd94cf.pth) | [pkl](https://download.openmmlab.com/mmgen/singan/singan_balloons_20210406_191047-8fcd94cf.pkl) | -| SinGAN | [fish.jpg](https://download.openmmlab.com/mmgen/dataset/singan/fish-crop.jpg) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/singan/singan_fish.py) | [ckpt](https://download.openmmlab.com/mmgen/singan/singan_fis_20210406_201006-860d91b6.pth) | [pkl](https://download.openmmlab.com/mmgen/singan/singan_fis_20210406_201006-860d91b6.pkl) | -| SinGAN | [bohemian.png](https://download.openmmlab.com/mmgen/dataset/singan/bohemian.png) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/singan/singan_bohemian.py) | [ckpt](https://download.openmmlab.com/mmgen/singan/singan_bohemian_20210406_175439-f964ee38.pth) | [pkl](https://download.openmmlab.com/mmgen/singan/singan_bohemian_20210406_175439-f964ee38.pkl) | - +| Model | Data | Num Scales | Config | Download | +| :----: | :------------------------------------------------------------------------------: | :--------: | :------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| SinGAN | [balloons.png](https://download.openmmlab.com/mmgen/dataset/singan/balloons.png) | 8 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/singan/singan_balloons.py) | [ckpt](https://download.openmmlab.com/mmgen/singan/singan_balloons_20210406_191047-8fcd94cf.pth) \| [pkl](https://download.openmmlab.com/mmgen/singan/singan_balloons_20210406_191047-8fcd94cf.pkl) | +| SinGAN | [fish.jpg](https://download.openmmlab.com/mmgen/dataset/singan/fish-crop.jpg) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/singan/singan_fish.py) | [ckpt](https://download.openmmlab.com/mmgen/singan/singan_fis_20210406_201006-860d91b6.pth) \| [pkl](https://download.openmmlab.com/mmgen/singan/singan_fis_20210406_201006-860d91b6.pkl) | +| SinGAN | [bohemian.png](https://download.openmmlab.com/mmgen/dataset/singan/bohemian.png) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/singan/singan_bohemian.py) | [ckpt](https://download.openmmlab.com/mmgen/singan/singan_bohemian_20210406_175439-f964ee38.pth) \| [pkl](https://download.openmmlab.com/mmgen/singan/singan_bohemian_20210406_175439-f964ee38.pkl) | ## Notes for using SinGAN diff --git a/configs/sngan_proj/README.md b/configs/sngan_proj/README.md index 1d4cc822b..db2314015 100644 --- a/configs/sngan_proj/README.md +++ b/configs/sngan_proj/README.md @@ -25,7 +25,6 @@ One of the challenges in the study of generative adversarial networks is the ins - | Models | Dataset | Inplace ReLU | disc_step | Total Iters\* | Iter | IS | FID | Config | Download | Log | | :---------------------------------------: | :------: | :----------: | :-------: | :-----------: | :----: | :-----: | :-----: | :------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | | SNGAN_Proj-32x32-woInplaceReLU Best IS | CIFAR10 | w/o | 5 | 500000 | 400000 | 9.6919 | 9.8203 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/sngan_proj/sngan_proj_32_woReLUinplace_lr-2e-4_ndisc5_cifar10_b64x1.py) | [ckpt](https://download.openmmlab.com/mmgen/sngan_proj/sngan_proj_cifar10_32_lr-2e-4_b64x1_woReLUinplace_is-iter400000_20210709_163823-902ce1ae.pth) | [Log](https://download.openmmlab.com/mmgen/sngan_proj/sngan_proj_cifar10_32_lr-2e-4_b64x1_woReLUinplace_20210624_065306_fid-ba0862a0_is-902ce1ae.json) | @@ -38,6 +37,7 @@ One of the challenges in the study of generative adversarial networks is the ins | SNGAN_Proj-128x128-wInplaceReLU Best FID | ImageNet | w | 5 | 1000000 | 988000 | 27.7948 | 33.4821 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/sngan_proj/sngan_proj_128_wReLUinplace_Glr-2e-4_Dlr-5e-5_ndisc5_imagenet1k_b128x2.py) | [ckpt](https://download.openmmlab.com/mmgen/sngan_proj/sngan_proj_imagenet1k_128_Glr2e-4_Dlr5e-5_ndisc5_b128x2_wReLUinplace_fid-iter988000_20210730_132401-9a682411.pth) | [Log](https://download.openmmlab.com/mmgen/sngan_proj/sngan_proj_imagenet1k_128_Glr2e-4_Dlr5e-5_ndisc5_b128x2_wReLUinplace_20210730_132401_fid-9a682411_is-ca0ccd07.json) | '\*' Iteration counting rule in our implementation is different from others. If you want to align with other codebases, you can use the following conversion formula: + ``` total_iters (biggan/pytorch studio gan) = our_total_iters / disc_step ``` @@ -50,11 +50,11 @@ To be noted that, in Pytorch Studio GAN, **inplace ReLU** is used in generator a | SAGAN_Proj-32x32 StudioGAN | CIFAR10 | w | 5 | 100000 | 9.372 | 10.2011 | 8.677 | 13.248 | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/_base_/models/sngan_proj_32x32.py) | [model](https://download.openmmlab.com/mmgen/sngan_proj/sngan_cifar10_convert-studio-rgb_20210709_111346-2979202d.pth) | [model](https://drive.google.com/drive/folders/16s5Cr-V-NlfLyy_uyXEkoNxLBt-8wYSM) | | SAGAN_Proj-128x128 StudioGAN | ImageNet | w | 2 | 1000000 | 30.218 | 29.8199 | 32.247 | 26.792 | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/_base_/models/sngan_proj_128x128.py) | [model](https://download.openmmlab.com/mmgen/sngan_proj/sngan_imagenet1k_convert-studio-rgb_20210709_111406-877b1130.pth) | [model](https://drive.google.com/drive/folders/1Ek2wAMlxpajL_M8aub4DKQ9B313K8XhS) | - -* `Our Pipeline` denote results evaluated with our pipeline. -* `StudioGAN` denote results released by Pytorch-StudioGAN. +- `Our Pipeline` denote results evaluated with our pipeline. +- `StudioGAN` denote results released by Pytorch-StudioGAN. For IS metric, our implementation is different from PyTorch-Studio GAN in the following aspects: + 1. We use [Tero's Inception](https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metrics/inception-2015-12-05.pt) for feature extraction. 2. We use bicubic interpolation with PIL backend to resize image before feed them to Inception. @@ -63,6 +63,7 @@ For FID evaluation, we follow the pipeline of [BigGAN](https://github.com/ajbroc You can download the preprocessed inception state by the following url: [CIFAR10](https://download.openmmlab.com/mmgen/evaluation/fid_inception_pkl/cifar10.pkl) and [ImageNet1k](https://download.openmmlab.com/mmgen/evaluation/fid_inception_pkl/imagenet.pkl). You can use following commands to extract those inception states by yourself. + ``` # For CIFAR10 python tools/utils/inception_stat.py --data-cfg configs/_base_/datasets/cifar10_inception_stat.py --pklname cifar10.pkl --no-shuffle --inception-style stylegan --num-samples -1 --subset train diff --git a/configs/styleganv1/README.md b/configs/styleganv1/README.md index 45738ec5c..0970d0bd1 100644 --- a/configs/styleganv1/README.md +++ b/configs/styleganv1/README.md @@ -11,6 +11,7 @@ We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces. +
diff --git a/configs/styleganv2/README.md b/configs/styleganv2/README.md index 594a3a6ed..b8d890c50 100644 --- a/configs/styleganv2/README.md +++ b/configs/styleganv2/README.md @@ -11,6 +11,7 @@ The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images. In addition to improving image quality, this path length regularizer yields the additional benefit that the generator becomes significantly easier to invert. This makes it possible to reliably attribute a generated image to a particular network. We furthermore visualize how well the generator utilizes its output resolution, and identify a capacity problem, motivating us to train larger models for additional quality improvements. Overall, our improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality. +
@@ -34,7 +35,6 @@ The style-based GAN architecture (StyleGAN) yields state-of-the-art results in d | stylegan2_config-f_ffhq_1024 | our training | 2.8185 | 68.236/49.583 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_ffhq_1024_b4x8.py) | [model](https://download.openmmlab.com/mmgen/stylegan2/stylegan2_c2_ffhq_1024_b4x8_20210407_150045-618c9024.pth) | | stylegan2_config-f_lsun-car_384x512 | our training | 2.4116 | 66.760/50.576 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_lsun-car_384x512_b4x8.py) | [model](https://download.openmmlab.com/mmgen/stylegan2/stylegan2_c2_lsun-car_384x512_b4x8_1800k_20210424_160929-fc9072ca.pth) | - ## FP16 Support and Experiments Currently, we have supported FP16 training for StyleGAN2, and here are the results for the mixed-precision training. (Experiments for FFHQ1024 will come soon.) @@ -47,10 +47,9 @@ Currently, we have supported FP16 training for StyleGAN2, and here are the resul As shown in the figure, we provide **3** ways to do mixed-precision training for `StyleGAN2`: -* [stylegan2_c2_fp16_PL-no-scaler](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_fp16_partial-GD_PL-no-scaler_ffhq_256_b4x8_800k.py): In this setting, we try our best to follow the official FP16 implementation in [StyleGAN2-ADA](https://github.com/NVlabs/stylegan2-ada). Similar to the official version, we only adopt FP16 training for the higher-resolution feature maps (the last 4 stages in G and the first 4 stages). Note that we do not adopt the `clamp` way to avoid gradient overflow used in the official implementation. We use the `autocast` function from `torch.cuda.amp` package. -* [stylegan2_c2_fp16-globalG-partialD_PL-R1-no-scaler](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_fp16-globalG-partialD_PL-R1-no-scaler_ffhq_256_b4x8_800k.py): In this config, we try to adopt mixed-precision training for the whole generator, but in partial discriminator (the first 4 higher-resolution stages). Note that we do not apply the loss scaler in the path length loss and gradient penalty loss. Because we always meet divergence after adopting the loss scaler to scale the gradient in these two losses. -* [stylegan2_c2_apex_fp16_PL-R1-no-scaler](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_apex_fp16_PL-R1-no-scaler_ffhq_256_b4x8_800k.py): In this setting, we adopt the [APEX](https://github.com/NVIDIA/apex) toolkit to implement mixed-precision training with multiple loss/gradient scalers. In APEX, you can assign different loss scalers for the generator and the discriminator respectively. Note that we still ignore the gradient scaler in the path length loss and gradient penalty loss. - +- [stylegan2_c2_fp16_PL-no-scaler](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_fp16_partial-GD_PL-no-scaler_ffhq_256_b4x8_800k.py): In this setting, we try our best to follow the official FP16 implementation in [StyleGAN2-ADA](https://github.com/NVlabs/stylegan2-ada). Similar to the official version, we only adopt FP16 training for the higher-resolution feature maps (the last 4 stages in G and the first 4 stages). Note that we do not adopt the `clamp` way to avoid gradient overflow used in the official implementation. We use the `autocast` function from `torch.cuda.amp` package. +- [stylegan2_c2_fp16-globalG-partialD_PL-R1-no-scaler](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_fp16-globalG-partialD_PL-R1-no-scaler_ffhq_256_b4x8_800k.py): In this config, we try to adopt mixed-precision training for the whole generator, but in partial discriminator (the first 4 higher-resolution stages). Note that we do not apply the loss scaler in the path length loss and gradient penalty loss. Because we always meet divergence after adopting the loss scaler to scale the gradient in these two losses. +- [stylegan2_c2_apex_fp16_PL-R1-no-scaler](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_apex_fp16_PL-R1-no-scaler_ffhq_256_b4x8_800k.py): In this setting, we adopt the [APEX](https://github.com/NVIDIA/apex) toolkit to implement mixed-precision training with multiple loss/gradient scalers. In APEX, you can assign different loss scalers for the generator and the discriminator respectively. Note that we still ignore the gradient scaler in the path length loss and gradient penalty loss. | Model | Comment | Dataset | FID50k | Config | Download | | :-------------------------------------------------------------------: | :-------------------------------------: | :-----: | :----: | :----------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | @@ -59,7 +58,6 @@ As shown in the figure, we provide **3** ways to do mixed-precision training for | stylegan2_c2_fp16-globalG-partialD_PL-R1-no-scaler_ffhq_256_b4x8_800k | the whole G in fp16 | FFHQ256 | 4.362 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_fp16-globalG-partialD_PL-R1-no-scaler_ffhq_256_b4x8_800k.py) | [ckpt](https://download.openmmlab.com/mmgen/stylegan2/stylegan2_c2_fp16-globalG-partialD_PL-R1-no-scaler_ffhq_256_b4x8_800k_20210508_114930-ef8270d4.pth?versionId=CAEQKxiBgIDOhrOoyhciIDM4ZTQxYzkxZTE4ZjQ2ZjM4ZmU3YzlhOWNkYWI1OWQ1) | | stylegan2_c2_apex_fp16_PL-R1-no-scaler_ffhq_256_b4x8_800k | the whole G&D in fp16 + two loss scaler | FFHQ256 | 4.614 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_apex_fp16_PL-R1-no-scaler_ffhq_256_b4x8_800k.py) | [ckpt](https://download.openmmlab.com/mmgen/stylegan2/stylegan2_c2_apex_fp16_PL-R1-no-scaler_ffhq_256_b4x8_800k_20210508_114701-c2bb8afd.pth?versionId=CAEQKxiBgMDQhrOoyhciIGE0ZGJkOWM2MTNjMzQ3Mjk4Y2NmMWMyNTViOTNiZTNh) | - In addition, we also provide `QuickTestImageDataset` to users for quickly checking whether the code can be run correctly. It's more important for FP16 experiments, because some cuda operations may no support mixed precision training. Esepcially for `APEX`, you can use [this config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_apex_fp16_quicktest_ffhq_256_b4x8_800k.py) in your local machine by running: ```bash @@ -70,17 +68,16 @@ bash tools/dist_train.sh \ With a similar way, users can switch to [config for partial-GD](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_fp16_quicktest_ffhq_256_b4x8_800k.py) and [config for globalG-partialD](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_fp16-globalG-partialD_PL-R1-no-scaler_ffhq_256_b4x8_800k.py) to test the other two mixed precision training configuration. - *Note that to use the [APEX](https://github.com/NVIDIA/apex) toolkit, you have to installed it following the official guidance. (APEX is not included in our requirements.) If you are using GPUs without tensor core, you would better to switch to the newer PyTorch version (>= 1.7,0). Otherwise, the APEX installation or running may meet several bugs.* ## About Different Implementations of FID Metric -| Model | Comment | FID50k | FID Version | Config | Download | -| :--------------------------: | :-------------: | :----: | :-------------: | :----------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| stylegan2_config-f_ffhq_1024 | official weight | 2.8732 | Tero's StyleGAN | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_ffhq_1024_b4x8.py) | [model](https://download.openmmlab.com/mmgen/stylegan2/official_weights/stylegan2-ffhq-config-f-official_20210327_171224-bce9310c.pth) | [FID-Reals](https://download.openmmlab.com/mmgen/evaluation/fid_inception_pkl/ffhq-1024-50k-stylegan.pkl) | -| stylegan2_config-f_ffhq_1024 | our training | 2.9413 | Tero's StyleGAN | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_ffhq_1024_b4x8.py) | [model](https://download.openmmlab.com/mmgen/stylegan2/stylegan2_c2_ffhq_1024_b4x8_20210407_150045-618c9024.pth) | [FID-Reals](https://download.openmmlab.com/mmgen/evaluation/fid_inception_pkl/ffhq-1024-50k-stylegan.pkl) | -| stylegan2_config-f_ffhq_1024 | official weight | 2.8134 | Our PyTorch | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_ffhq_1024_b4x8.py) | [model](https://download.openmmlab.com/mmgen/stylegan2/official_weights/stylegan2-ffhq-config-f-official_20210327_171224-bce9310c.pth) | [FID-Reals](https://download.openmmlab.com/mmgen/evaluation/fid_inception_pkl/ffhq-1024-50k-rgb.pkl) | -| stylegan2_config-f_ffhq_1024 | our training | 2.8185 | Our PyTorch | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_ffhq_1024_b4x8.py) | [model](https://download.openmmlab.com/mmgen/stylegan2/stylegan2_c2_ffhq_1024_b4x8_20210407_150045-618c9024.pth) | [FID-Reals](https://download.openmmlab.com/mmgen/evaluation/fid_inception_pkl/ffhq-1024-50k-rgb.pkl) | +| Model | Comment | FID50k | FID Version | Config | Download | +| :--------------------------: | :-------------: | :----: | :-------------: | :----------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| stylegan2_config-f_ffhq_1024 | official weight | 2.8732 | Tero's StyleGAN | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_ffhq_1024_b4x8.py) | [model](https://download.openmmlab.com/mmgen/stylegan2/official_weights/stylegan2-ffhq-config-f-official_20210327_171224-bce9310c.pth) \| [FID-Reals](https://download.openmmlab.com/mmgen/evaluation/fid_inception_pkl/ffhq-1024-50k-stylegan.pkl) | +| stylegan2_config-f_ffhq_1024 | our training | 2.9413 | Tero's StyleGAN | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_ffhq_1024_b4x8.py) | [model](https://download.openmmlab.com/mmgen/stylegan2/stylegan2_c2_ffhq_1024_b4x8_20210407_150045-618c9024.pth) \| [FID-Reals](https://download.openmmlab.com/mmgen/evaluation/fid_inception_pkl/ffhq-1024-50k-stylegan.pkl) | +| stylegan2_config-f_ffhq_1024 | official weight | 2.8134 | Our PyTorch | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_ffhq_1024_b4x8.py) | [model](https://download.openmmlab.com/mmgen/stylegan2/official_weights/stylegan2-ffhq-config-f-official_20210327_171224-bce9310c.pth) \| [FID-Reals](https://download.openmmlab.com/mmgen/evaluation/fid_inception_pkl/ffhq-1024-50k-rgb.pkl) | +| stylegan2_config-f_ffhq_1024 | our training | 2.8185 | Our PyTorch | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv2/stylegan2_c2_ffhq_1024_b4x8.py) | [model](https://download.openmmlab.com/mmgen/stylegan2/stylegan2_c2_ffhq_1024_b4x8_20210407_150045-618c9024.pth) \| [FID-Reals](https://download.openmmlab.com/mmgen/evaluation/fid_inception_pkl/ffhq-1024-50k-rgb.pkl) | In this table, we observe that the FID with Tero's inception network is similar to that with PyTorch Inception (in MMGeneration). Thus, we use the FID with PyTorch's Inception net (but the weight is not the official model zoo) by default. Because it can be run on different PyTorch versions. If you use [Tero's Inception net](https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metrics/inception-2015-12-05.pt), your PyTorch must meet `>=1.6.0`. diff --git a/configs/styleganv3/README.md b/configs/styleganv3/README.md old mode 100755 new mode 100644 index 24d433a0c..59ffd8c23 --- a/configs/styleganv3/README.md +++ b/configs/styleganv3/README.md @@ -5,6 +5,7 @@ ## Abstract + We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace @@ -16,14 +17,12 @@ the FID of StyleGAN2 but differ dramatically in their internal representations, they are fully equivariant to translation and rotation even at subpixel scales. Our results pave the way for generative models better suited for video and animation. - +
- - ## Results and Models
@@ -36,36 +35,40 @@ We perform experiments on StyleGANv3 paper settings and also experimental settin For user convenience, we also offer the converted version of official weights. ### Paper Settings -| Model | Dataset | Iter |FID50k | Config | Log | Download | -| :---------------------------------: | :-------------: | :-----------: | :-----------: |:---------------------------------------------------------------------------------------------------------------------------: | :-------------: |:--------------------------------------------------------------------------------------------------------------------------------------: | -| stylegan3-t | ffhq 1024x1024 | 490000 | 3.37*| [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv3/stylegan3_t_noaug_fp16_gamma32.8_ffhq_1024_b4x8.py) | [log](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_noaug_fp16_gamma32.8_ffhq_1024_b4x8_20220322_090417.log.json) |[model](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_noaug_fp16_gamma32.8_ffhq_1024_b4x8_best_fid_iter_490000_20220401_120733-4ff83434.pth) | -| stylegan3-t-ada | metface 1024x1024 | 130000 | 15.09 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv3/stylegan3_t_ada_fp16_gamma6.6_metfaces_1024_b4x8.py) | [log](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_ada_fp16_gamma6.6_metfaces_1024_b4x8_20220328_142211.log.json) |[model](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_ada_fp16_gamma6.6_metfaces_1024_b4x8_best_fid_iter_130000_20220401_115101-f2ef498e.pth) | -Note*: This setting still needs a few days to run through, we put out currently the best checkpoint, and we will update the results the first time on the end of the experiment. +| Model | Dataset | Iter | FID50k | Config | Log | Download | +| :-------------: | :---------------: | :----: | :---------------: | :-------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------: | +| stylegan3-t | ffhq 1024x1024 | 490000 | 3.37\* | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv3/stylegan3_t_noaug_fp16_gamma32.8_ffhq_1024_b4x8.py) | [log](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_noaug_fp16_gamma32.8_ffhq_1024_b4x8_20220322_090417.log.json) | [model](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_noaug_fp16_gamma32.8_ffhq_1024_b4x8_best_fid_iter_490000_20220401_120733-4ff83434.pth) | +| stylegan3-t-ada | metface 1024x1024 | 130000 | 15.09 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv3/stylegan3_t_ada_fp16_gamma6.6_metfaces_1024_b4x8.py) | [log](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_ada_fp16_gamma6.6_metfaces_1024_b4x8_20220328_142211.log.json) | [model](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_ada_fp16_gamma6.6_metfaces_1024_b4x8_best_fid_iter_130000_20220401_115101-f2ef498e.pth) | + +Note\*: This setting still needs a few days to run through, we put out currently the best checkpoint, and we will update the results the first time on the end of the experiment. ### Experimental Settings -| Model | Dataset |Iter | FID50k | Config | Log | Download | -| :---------------------------------: | :-------------: |:-----------: | :-----------: |:---------------------------------------------------------------------------------------------------------------------------: | :-------------: |:--------------------------------------------------------------------------------------------------------------------------------------: | -| stylegan3-t | ffhq 256x256 | 740000 |7.65 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv3/stylegan3_t_noaug_fp16_gamma2.0_ffhq_256_b4x8.py) | [log](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_noaug_fp16_gamma2.0_ffhq_256_b4x8_20220323_144815.log.json) |[model](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_noaug_fp16_gamma2.0_ffhq_256_b4x8_best_fid_iter_740000_20220401_122456-730e1fba.pth) | -### Converted Weights -| Model | Dataset | Comment | FID50k | EQ-T | EQ-R | Config | Download | -| :---------------------------------: | :-------------: |:-------------: | :----: | :-----------: | :-----------: |:---------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------: | -| stylegan3-t | ffhqu 256x256|official weight | 4.62 | 63.01 | 13.12 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/_base_/models/stylegan/stylegan3_t_ffhqu_256_b4x8_cvt_official_rgb.py) | [model](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_ffhqu_256_b4x8_cvt_official_rgb_20220329_235046-153df4c8.pth) | -| stylegan3-t |afhqv2 512x512 |official weight | 4.04 | 60.15 | 13.51 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/_base_/models/stylegan/stylegan3_t_afhqv2_512_b4x8_cvt_official_rgb.py) | [model](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_afhqv2_512_b4x8_cvt_official_rgb_20220329_235017-ee6b037a.pth) | -| stylegan3-t |ffhq 1024x1024 |official weight | 2.79 | 61.21 | 13.82 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/_base_/models/stylegan/stylegan3_t_ffhq_1024_b4x8_cvt_official_rgb.py) | [model](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_ffhq_1024_b4x8_cvt_official_rgb_20220329_235113-db6c6580.pth) | -| stylegan3-r | ffhqu 256x256 |official weight | 4.50| 66.65 | 40.48 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/_base_/models/stylegan/stylegan3_r_ffhqu_256_b4x8_cvt_official_rgb.py) | [model](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_r_ffhqu_256_b4x8_cvt_official_rgb_20220329_234909-4521d963.pth) | -| stylegan3-r | afhqv2 512x512 |official weight |4.40 |64.89 | 40.34 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/_base_/models/stylegan/stylegan3_r_afhqv2_512_b4x8_cvt_official_rgb.py) | [model](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_r_afhqv2_512_b4x8_cvt_official_rgb_20220329_234829-f2eaca72.pth) | -| stylegan3-r |ffhq 1024x1024 |official weight |3.07 | 64.76 | 46.62 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/_base_/models/stylegan/stylegan3_r_ffhq_1024_b4x8_cvt_official_rgb.py) | [model](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_r_ffhq_1024_b4x8_cvt_official_rgb_20220329_234933-ac0500a1.pth) | +| Model | Dataset | Iter | FID50k | Config | Log | Download | +| :---------: | :----------: | :----: | :----: | :----------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------: | +| stylegan3-t | ffhq 256x256 | 740000 | 7.65 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/styleganv3/stylegan3_t_noaug_fp16_gamma2.0_ffhq_256_b4x8.py) | [log](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_noaug_fp16_gamma2.0_ffhq_256_b4x8_20220323_144815.log.json) | [model](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_noaug_fp16_gamma2.0_ffhq_256_b4x8_best_fid_iter_740000_20220401_122456-730e1fba.pth) | +### Converted Weights +| Model | Dataset | Comment | FID50k | EQ-T | EQ-R | Config | Download | +| :---------: | :------------: | :-------------: | :----: | :---: | :---: | :---------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------: | +| stylegan3-t | ffhqu 256x256 | official weight | 4.62 | 63.01 | 13.12 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/_base_/models/stylegan/stylegan3_t_ffhqu_256_b4x8_cvt_official_rgb.py) | [model](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_ffhqu_256_b4x8_cvt_official_rgb_20220329_235046-153df4c8.pth) | +| stylegan3-t | afhqv2 512x512 | official weight | 4.04 | 60.15 | 13.51 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/_base_/models/stylegan/stylegan3_t_afhqv2_512_b4x8_cvt_official_rgb.py) | [model](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_afhqv2_512_b4x8_cvt_official_rgb_20220329_235017-ee6b037a.pth) | +| stylegan3-t | ffhq 1024x1024 | official weight | 2.79 | 61.21 | 13.82 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/_base_/models/stylegan/stylegan3_t_ffhq_1024_b4x8_cvt_official_rgb.py) | [model](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_ffhq_1024_b4x8_cvt_official_rgb_20220329_235113-db6c6580.pth) | +| stylegan3-r | ffhqu 256x256 | official weight | 4.50 | 66.65 | 40.48 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/_base_/models/stylegan/stylegan3_r_ffhqu_256_b4x8_cvt_official_rgb.py) | [model](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_r_ffhqu_256_b4x8_cvt_official_rgb_20220329_234909-4521d963.pth) | +| stylegan3-r | afhqv2 512x512 | official weight | 4.40 | 64.89 | 40.34 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/_base_/models/stylegan/stylegan3_r_afhqv2_512_b4x8_cvt_official_rgb.py) | [model](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_r_afhqv2_512_b4x8_cvt_official_rgb_20220329_234829-f2eaca72.pth) | +| stylegan3-r | ffhq 1024x1024 | official weight | 3.07 | 64.76 | 46.62 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/_base_/models/stylegan/stylegan3_r_ffhq_1024_b4x8_cvt_official_rgb.py) | [model](https://download.openmmlab.com/mmgen/stylegan3/stylegan3_r_ffhq_1024_b4x8_cvt_official_rgb_20220329_234933-ac0500a1.pth) | ## Interpolation + We provide a tool to generate video by walking through GAN's latent space. Run this command to get the following video. + ```bash python apps/interpolate_sample.py configs/styleganv3/stylegan3_t_afhqv2_512_b4x8_official.py https://download.openmmlab.com/mmgen/stylegan3/stylegan3_t_afhqv2_512_b4x8_cvt_official.pkl --export-video --samples-path work_dirs/demos/ --endpoint 6 --interval 60 --space z --seed 2022 --sample-cfg truncation=0.8 ``` + https://user-images.githubusercontent.com/22982797/151506918-83da9ee3-0d63-4c5b-ad53-a41562b92075.mp4 ## Equivarience Visualization && Evaluation @@ -81,16 +84,14 @@ python tools/utils/equivariance_viz.py configs/styleganv3/stylegan3_r_ffhqu_256_ python tools/utils/equivariance_viz.py configs/styleganv3/stylegan3_r_ffhqu_256_b4x8_official.py https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmgen/stylegan3/stylegan3_r_ffhqu_256_b4x8_cvt_official.pkl --translate_max 0.25 --transform y_t --seed 5432 ``` - https://user-images.githubusercontent.com/22982797/151504902-f3cbfef5-9014-4607-bbe1-deaf48ec6d55.mp4 - https://user-images.githubusercontent.com/22982797/151504973-b96e1639-861d-434b-9d7c-411ebd4a653f.mp4 - https://user-images.githubusercontent.com/22982797/151505099-cde4999e-aab1-42d4-a458-3bb069db3d32.mp4 If you want to get EQ-Metric for StyleGAN3, just add following codes into config. + ```python metrics = dict( eqv=dict( @@ -99,8 +100,8 @@ metrics = dict( eq_cfg=dict( compute_eqt_int=True, compute_eqt_frac=True, compute_eqr=True))) ``` -And we highly recommend you to use [slurm_eval_multi_gpu](tools/slurm_eval_multi_gpu.sh) script to accelerate evaluation time. +And we highly recommend you to use [slurm_eval_multi_gpu](tools/slurm_eval_multi_gpu.sh) script to accelerate evaluation time. ## Citation diff --git a/configs/styleganv3/metafile.yml b/configs/styleganv3/metafile.yml index 36ec52aa9..2c8cb0514 100755 --- a/configs/styleganv3/metafile.yml +++ b/configs/styleganv3/metafile.yml @@ -15,7 +15,7 @@ Models: Results: - Dataset: FFHQ Metrics: - FID50k: 3.37 + FID50k: 3.37\ Iter: 490000.0 Log: '[log]' Task: Unconditional GANs diff --git a/configs/wgan-gp/README.md b/configs/wgan-gp/README.md index dec3d1ef5..df6d994e8 100644 --- a/configs/wgan-gp/README.md +++ b/configs/wgan-gp/README.md @@ -11,6 +11,7 @@ Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserstein GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models over discrete data. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms. +
diff --git a/docs/en/changelog.md b/docs/en/changelog.md index 90d069576..328d13bbb 100644 --- a/docs/en/changelog.md +++ b/docs/en/changelog.md @@ -13,7 +13,6 @@ - Efficient Distributed Training for Generative Models: For the highly dynamic training in generative models, we adopt a new way to train dynamic models with `MMDDP`. - New Modular Design for Flexible Combination: A new design for complex loss modules is proposed for customizing the links between modules, which can achieve flexible combination among different modules. - ## v0.2.0 (30/05/2021) #### Highlights @@ -34,7 +33,6 @@ - Fix error when data_root option in val_cfg or test_cfg are set as None (#28) - Change latex in quick_run.md to svg url and fix number of checkpoints in modelzoo_statistics.md (#34) - ## v0.3.0 (02/08/2021) #### Highlights @@ -52,7 +50,6 @@ - Revise the logic of `num_classes` in basic conditional gan #69 - Support dynamic eval internal in eval hook #73 - ## v0.4.0 (03/11/2021) #### Highlights @@ -71,7 +68,6 @@ - Add support for PyTorch1.9 #115 - Add pre-commit hook for spell checking #135 - ## v0.5.0 (12/01/2022) #### Highlights @@ -93,7 +89,6 @@ - Fix bug in SinGAN dataset (#192) - Fix SAGAN, SNGAN and BigGAN's default `sn_style` (#199, #213) - ## v0.6.0 (07/03/2022) #### Highlights @@ -106,14 +101,12 @@ - Support training on CPU (#238) - Speed up training (#231) - #### Fix bugs and Improvements - Fix bug in non-distributed training/testing (#239) - Fix typos and invalid links (#221, #226, #228, #244, #249) - Add part of Chinese documentation (#250, #257) - ## v0.7.0 (02/04/2022) #### Highlights @@ -128,14 +121,15 @@ - Add multi machine distribute train (#267) #### Fix bugs and Improvements + - Add brief installation steps in README (#270) - Support random seed for distributed sampler (#271) - Use hyphen for command line args in apps (#273) - ## v0.7.1 (30/04/2022) #### Fix bugs and Improvements + - Support train_dataloader, val_dataloader and test_dataloader settings (#281) - Fix ada typo (#283) - Add chinese application tutorial (#284) diff --git a/docs/en/get_started.md b/docs/en/get_started.md index 1591426ef..55ba29736 100644 --- a/docs/en/get_started.md +++ b/docs/en/get_started.md @@ -18,95 +18,93 @@ If mmcv and mmcv-full are both installed, there will be `ModuleNotFoundError`. ## Installation -1. Create a conda virtual environment and activate it. (Here, we assume the new environment is called ``open-mmlab``) +1. Create a conda virtual environment and activate it. (Here, we assume the new environment is called `open-mmlab`) - ```shell - conda create -n open-mmlab python=3.7 -y - conda activate open-mmlab - ``` + ```shell + conda create -n open-mmlab python=3.7 -y + conda activate open-mmlab + ``` 2. Install PyTorch and torchvision following the [official instructions](https://pytorch.org/), e.g., - ```shell - conda install pytorch torchvision -c pytorch - ``` + ```shell + conda install pytorch torchvision -c pytorch + ``` - Note: Make sure that your compilation CUDA version and runtime CUDA version match. - You can check the supported CUDA version for precompiled packages on the [PyTorch website](https://pytorch.org/). + Note: Make sure that your compilation CUDA version and runtime CUDA version match. + You can check the supported CUDA version for precompiled packages on the [PyTorch website](https://pytorch.org/). - `E.g.1` If you have CUDA 10.1 installed under `/usr/local/cuda` and would like to install - PyTorch 1.5, you need to install the prebuilt PyTorch with CUDA 10.1. + `E.g.1` If you have CUDA 10.1 installed under `/usr/local/cuda` and would like to install + PyTorch 1.5, you need to install the prebuilt PyTorch with CUDA 10.1. - ```shell - conda install pytorch cudatoolkit=10.1 torchvision -c pytorch - ``` + ```shell + conda install pytorch cudatoolkit=10.1 torchvision -c pytorch + ``` - `E.g. 2` If you have CUDA 9.2 installed under `/usr/local/cuda` and would like to install - PyTorch 1.5.1., you need to install the prebuilt PyTorch with CUDA 9.2. + `E.g. 2` If you have CUDA 9.2 installed under `/usr/local/cuda` and would like to install + PyTorch 1.5.1., you need to install the prebuilt PyTorch with CUDA 9.2. - ```shell - conda install pytorch=1.5.1 cudatoolkit=9.2 torchvision=0.6.1 -c pytorch - ``` + ```shell + conda install pytorch=1.5.1 cudatoolkit=9.2 torchvision=0.6.1 -c pytorch + ``` - If you build PyTorch from source instead of installing the prebuilt package, - you can use more CUDA versions such as 9.0. + If you build PyTorch from source instead of installing the prebuilt package, + you can use more CUDA versions such as 9.0. 3. Install mmcv-full, we recommend you to install the pre-build package as below. - ```shell - pip install mmcv-full={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html - ``` + ```shell + pip install mmcv-full={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html + ``` - Please replace `{cu_version}` and `{torch_version}` in the url to your desired one. For example, to install the latest `mmcv-full` with `CUDA 11` and `PyTorch 1.7.0`, use the following command: + Please replace `{cu_version}` and `{torch_version}` in the url to your desired one. For example, to install the latest `mmcv-full` with `CUDA 11` and `PyTorch 1.7.0`, use the following command: - ```shell - pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html - ``` + ```shell + pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html + ``` - See [here](https://github.com/open-mmlab/mmcv#install-with-pip) for different versions of MMCV compatible to different PyTorch and CUDA versions. - Optionally you can choose to compile mmcv from source by the following command + See [here](https://github.com/open-mmlab/mmcv#install-with-pip) for different versions of MMCV compatible to different PyTorch and CUDA versions. + Optionally you can choose to compile mmcv from source by the following command - ```shell - git clone https://github.com/open-mmlab/mmcv.git - cd mmcv - MMCV_WITH_OPS=1 pip install -e . # package mmcv-full will be installed after this step - cd .. - ``` + ```shell + git clone https://github.com/open-mmlab/mmcv.git + cd mmcv + MMCV_WITH_OPS=1 pip install -e . # package mmcv-full will be installed after this step + cd .. + ``` - Or directly run + Or directly run - ```shell - pip install mmcv-full - ``` + ```shell + pip install mmcv-full + ``` 4. Clone the MMGeneration repository. - ```shell - git clone https://github.com/open-mmlab/mmgeneration.git - cd mmgeneration - ``` + ```shell + git clone https://github.com/open-mmlab/mmgeneration.git + cd mmgeneration + ``` 5. Install build requirements and then install MMGeneration. - ```shell - pip install -r requirements.txt - pip install -v -e . # or "python setup.py develop" - ``` + ```shell + pip install -r requirements.txt + pip install -v -e . # or "python setup.py develop" + ``` Note: a. Following the above instructions, MMGeneration is installed on `dev` mode, any local modifications made to the code will take effect without the need to reinstall it. -b. If you would like to use `opencv-python-headless` instead of `opencv --python`, +b. If you would like to use `opencv-python-headless` instead of `opencv -python`, you can install it before installing MMCV. ### Install with CPU only The code can be built for CPU only environment (where CUDA isn't available). - ### A from-scratch setup script Assuming that you already have CUDA 10.1 installed, here is a full script for setting up MMGeneration with conda. diff --git a/docs/en/modelzoo_statistics.md b/docs/en/modelzoo_statistics.md index 6530d3090..4fa28ed55 100644 --- a/docs/en/modelzoo_statistics.md +++ b/docs/en/modelzoo_statistics.md @@ -1,49 +1,35 @@ - # Model Zoo Statistics -* Number of papers: 15 -* Number of checkpoints: 91 - - * [Large Scale GAN Training for High Fidelity Natural Image Synthesis](https://github.com/open-mmlab/mmgeneration/blob/master/configs/biggan) (7 ckpts) - - - * [CycleGAN: Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/cyclegan) (6 ckpts) - - - * [Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/dcgan) (3 ckpts) - - - * [Geometric GAN](https://github.com/open-mmlab/mmgeneration/blob/master/configs/ggan) (3 ckpts) - - - * [Improved Denoising Diffusion Probabilistic Models](https://github.com/open-mmlab/mmgeneration/blob/master/configs/improved_ddpm) (3 ckpts) - - - * [Least Squares Generative Adversarial Networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/lsgan) (4 ckpts) - - - * [Progressive Growing of GANs for Improved Quality, Stability, and Variation](https://github.com/open-mmlab/mmgeneration/blob/master/configs/pggan) (3 ckpts) +- Number of papers: 15 +- Number of checkpoints: 91 - * [Pix2Pix: Image-to-Image Translation with Conditional Adversarial Networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/pix2pix) (4 ckpts) + - [Large Scale GAN Training for High Fidelity Natural Image Synthesis](https://github.com/open-mmlab/mmgeneration/blob/master/configs/biggan) (7 ckpts) + - [CycleGAN: Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/cyclegan) (6 ckpts) - * [Positional Encoding as Spatial Inductive Bias in GANs (CVPR'2021)](https://github.com/open-mmlab/mmgeneration/blob/master/configs/positional_encoding_in_gans) (21 ckpts) + - [Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/dcgan) (3 ckpts) + - [Geometric GAN](https://github.com/open-mmlab/mmgeneration/blob/master/configs/ggan) (3 ckpts) - * [Self-attention generative adversarial networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/sagan) (9 ckpts) + - [Improved Denoising Diffusion Probabilistic Models](https://github.com/open-mmlab/mmgeneration/blob/master/configs/improved_ddpm) (3 ckpts) + - [Least Squares Generative Adversarial Networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/lsgan) (4 ckpts) - * [Singan: Learning a Generative Model from a Single Natural Image (ICCV'2019)](https://github.com/open-mmlab/mmgeneration/blob/master/configs/singan) (3 ckpts) + - [Progressive Growing of GANs for Improved Quality, Stability, and Variation](https://github.com/open-mmlab/mmgeneration/blob/master/configs/pggan) (3 ckpts) + - [Pix2Pix: Image-to-Image Translation with Conditional Adversarial Networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/pix2pix) (4 ckpts) - * [Spectral Normalization for Generative Adversarial Networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/sngan_proj) (10 ckpts) + - [Positional Encoding as Spatial Inductive Bias in GANs (CVPR'2021)](https://github.com/open-mmlab/mmgeneration/blob/master/configs/positional_encoding_in_gans) (21 ckpts) + - [Self-attention generative adversarial networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/sagan) (9 ckpts) - * [A Style-Based Generator Architecture for Generative Adversarial Networks (CVPR'2019)](https://github.com/open-mmlab/mmgeneration/blob/master/configs/styleganv1) (2 ckpts) + - [Singan: Learning a Generative Model from a Single Natural Image (ICCV'2019)](https://github.com/open-mmlab/mmgeneration/blob/master/configs/singan) (3 ckpts) + - [Spectral Normalization for Generative Adversarial Networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/sngan_proj) (10 ckpts) - * [Analyzing and Improving the Image Quality of Stylegan (CVPR'2020)](https://github.com/open-mmlab/mmgeneration/blob/master/configs/styleganv2) (11 ckpts) + - [A Style-Based Generator Architecture for Generative Adversarial Networks (CVPR'2019)](https://github.com/open-mmlab/mmgeneration/blob/master/configs/styleganv1) (2 ckpts) + - [Analyzing and Improving the Image Quality of Stylegan (CVPR'2020)](https://github.com/open-mmlab/mmgeneration/blob/master/configs/styleganv2) (11 ckpts) - * [Improved Training of Wasserstein GANs](https://github.com/open-mmlab/mmgeneration/blob/master/configs/wgan-gp) (2 ckpts) + - [Improved Training of Wasserstein GANs](https://github.com/open-mmlab/mmgeneration/blob/master/configs/wgan-gp) (2 ckpts) diff --git a/docs/en/quick_run.md b/docs/en/quick_run.md index c70515d14..480054765 100644 --- a/docs/en/quick_run.md +++ b/docs/en/quick_run.md @@ -39,6 +39,7 @@ python demo/unconditional_demo.py \ [--save-path ${SAVE_PATH}] \ [--device ${GPU_ID}] ``` + Note that more arguments are also offered to customizing your sampling procedure. Please use `python demo/unconditional_demo.py --help` to check more details. ### Sample images with conditional GANs @@ -79,6 +80,7 @@ python demo/conditional_demo.py \ [--save-path ${SAVE_PATH}] \ [--device ${GPU_ID}] ``` + If `--label` is not passed, images with random labels would be generated. If `--label` is passed, we would generate `${SAMPLES_PER_CLASSES}` images for each input label. If `sample_all_classes` is set true in command line, `--label` would be ignored and the generator will output images for all categories. @@ -86,6 +88,7 @@ If `sample_all_classes` is set true in command line, `--label` would be ignored Note that more arguments are also offered to customizing your sampling procedure. Please use `python demo/conditional_demo.py --help` to check more details. ### Sample images with image translation models + MMGeneration provides high-level APIs for translating images by using image translation models. Here is an example of building Pix2Pix and obtaining the translated images. ```python @@ -116,6 +119,7 @@ python demo/translation_demo.py \ [--save-path ${SAVE_PATH}] \ [--device ${GPU_ID}] ``` + Note that more customized arguments are also offered to customizing your sampling procedure. Please use `python demo/translation_demo.py --help` to check more details. # 2: Prepare dataset for training and testing @@ -141,6 +145,7 @@ Here, we provide several download links of datasets frequently used in unconditi For translation models, now we offer two settings for datasets called paired image dataset and unpaired image dataset. For paired image dataset, every image is formed by concatenating two corresponding images from two domains along the width dimension. You are supposed to make two folders "train" and "test" filled with images of this format for training and testing. Folder structure is presented below. + ``` ./data/dataset_name/ ├── test @@ -227,9 +232,11 @@ export CUDA_VISIBLE_DEVICES=-1 ``` And then run this script. + ```shell python tools/train.py config --work-dir WORK_DIR ``` + **Note**: We do not recommend users to use CPU for training because it is too slow. We support this feature to allow users to debug on machines without GPU for convenience. Also you cannot train Dynamic GANs on CPU. For more details, please refer to [ddp training](docs/en/tutorials/ddp_train_gans.md). @@ -247,12 +254,14 @@ metrics = dict( inception_pkl='work_dirs/inception_pkl/ffhq-256-50k-rgb.pkl', bgr2rgb=True)) ``` + (We will specify how to obtain `inception_pkl` in the [FID](#FID) section.) Then, users can use the evaluation script with the following command: ```shell sh eval.sh ${CONFIG_FILE} ${CKPT_FILE} --batch-size 10 --online ``` + If you are in slurm environment, please switch to the [tools/slurm_eval.sh](https://github.com/open-mmlab/mmgeneration/tree/master/tools/slurm_eval.sh) by using the following commands: ```shell @@ -273,11 +282,13 @@ sh slurm_eval.sh ${PLATFORM} ${JOBNAME} ${CONFIG_FILE} ${CKPT_FILE} \ ``` We also provide [tools/utils/translation_eval.py](https://github.com/open-mmlab/mmgeneration/blob/master/tools/utils/translation_eval.py) for users to evaluate their translation models. You are supposed to set the `target-domain` of the output images and run the following command: + ```shell python tools/utils/translation_eval.py ${CONFIG_FILE} ${CKPT_FILE} --t ${target-domain} ``` To be noted that, in current version of MMGeneration, we support multi GPUs for [FID](#fid) and [IS](#is) evaluation and image saving. You can use the following command to use this feature: + ```shell # online evaluation sh dist_eval.sh ${CONFIG_FILE} ${CKPT_FILE} ${GPUS_NUMBER} --batch-size 10 --online @@ -294,11 +305,13 @@ sh dist_eval.sh${CONFIG_FILE} ${CKPT_FILE} ${GPUS_NUMBER} --eval none --samples- # image saving with slurm sh slurm_eval_multi_gpu.sh ${PLATFORM} ${JOBNAME} ${CONFIG_FILE} ${CKPT_FILE} --eval none --samples-path ${SAMPLES_PATH} ``` + In the subsequent version, multi GPUs evaluation for more metrics will be supported. Next, we will specify the details of different metrics one by one. ## **FID** + Fréchet Inception Distance is a measure of similarity between two datasets of images. It was shown to correlate well with the human judgment of visual quality and is most often used to evaluate the quality of samples of Generative Adversarial Networks. FID is calculated by computing the Fréchet distance between two Gaussians fitted to feature representations of the Inception network. In `MMGeneration`, we provide two versions for FID calculation. One is the commonly used PyTorch version and the other one is used in StyleGAN paper. Meanwhile, we have compared the difference between these two implementations in the StyleGAN2-FFHQ1024 model (the details can be found [here](https://github.com/open-mmlab/mmgeneration/blob/master/configs/styleganv2/README.md)). Fortunately, there is a marginal difference in the final results. Thus, we recommend users adopt the more convenient PyTorch version. @@ -310,12 +323,14 @@ In `MMGeneration`, we provide two versions for FID calculation. One is the commo ```shell python tools/utils/inception_stat.py --imgsdir ${IMGS_PATH} --pklname ${PKLNAME} --size ${SIZE} ``` + In the aforementioned command, the script will take the PyTorch InceptionV3 by default. If you want the Tero's InceptionV3, you will need to switch to the script module: ```shell python tools/utils/inception_stat.py --imgsdir ${IMGS_PATH} --pklname ${PKLNAME} --size ${SIZE} \ --inception-style stylegan --inception-pth ${PATH_SCRIPT_MODULE} ``` + If you want to know more information about how to extract the inception state please refer to this [doc](https://github.com/open-mmlab/mmgeneration/blob/master/docs/en/tutorials/inception_stat.md). To use the FID metric, you should add the metric in a config file like this: @@ -328,6 +343,7 @@ metrics = dict( inception_pkl='work_dirs/inception_pkl/ffhq-256-50k-rgb.pkl', bgr2rgb=True)) ``` + If the `inception_pkl` is not set, the metric will calculate the real inception statistics on the fly. If you hope to use the Tero's InceptionV3, please use the following metric configuration: ```python @@ -340,9 +356,11 @@ metrics = dict( inception_path='work_dirs/cache/inception-2015-12-05.pt'))) ``` + The `inception_path` indicates the path to Tero's script module. ## Precision and Recall + Our `Precision and Recall` implementation follows the version used in StyleGAN2. In this metric, a VGG network will be adopted to extract the features for images. Unfortunately, we have not found a PyTorch VGG implementation leading to similar results with Tero's version used in StyleGAN2. (About the differences, please see this [file](https://github.com/open-mmlab/mmgeneration/blob/master/configs/styleganv2/README.md).) Thus, in our implementation, we adopt [Teor's VGG](https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metrics/vgg16.pt) network by default. Importantly, applying this script module needs `PyTorch >= 1.6.0`. If with a lower PyTorch version, we will use the PyTorch official VGG network for feature extraction. To evaluate with `P&R`, please add the following configuration in the config file: @@ -355,20 +373,25 @@ metrics = dict( ``` ## IS + Inception score is an objective metric for evaluating the quality of generated images, proposed in [Improved Techniques for Training GANs](https://arxiv.org/pdf/1606.03498.pdf). It uses an InceptionV3 model to predict the class of the generated images, and suppose that 1) If an image is of high quality, it will be categorized into a specific class. 2) If images are of high diversity, the range of images' classes will be wide. So the KL-divergence of the conditional probability and marginal probability can indicate the quality and diversity of generated images. You can see the complete implementation in `metrics.py`, which refers to https://github.com/sbarratt/inception-score-pytorch/blob/master/inception_score.py. If you want to evaluate models with `IS` metrics, you can add the `metrics` into your config file like this: + ```python # at the end of the configs/pix2pix/pix2pix_vanilla_unet_bn_facades_b1x1_80k.py metrics = dict( IS=dict(type='IS', num_images=106, image_shape=(3, 256, 256))) ``` + You can run the command below to calculate IS. + ```shell python tools/utils/translation_eval.py --t photo \ ./configs/pix2pix/pix2pix_vanilla_unet_bn_facades_b1x1_80k.py \ https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_1x1_80k_facades_20210902_170442-c0958d50.pth \ --eval IS ``` + To be noted that, the selection of Inception V3 and image resize method can significantly influence the final IS score. Therefore, we strongly recommend users may download the [Tero's script model of Inception V3](https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metrics/inception-2015-12-05.pt) (load this script model need torch >= 1.6) and use `Bicubic` interpolation with `Pillow` backend. We provide a template for the [data process pipline](https://github.com/open-mmlab/mmgeneration/tree/master/configs/_base_/datasets/Inception_Score.py) as well. We also perform a survey on the influence of data loading pipeline and the version of pretrained Inception V3 on the IS result. All IS are evaluated on the same group of images which are randomly selected from the ImageNet dataset. @@ -376,7 +399,7 @@ We also perform a survey on the influence of data loading pipeline and the versi
Show the Comparison Results | Code Base | Inception V3 Version | Data Loader Backend | Resize Interpolation Method | IS | -|:---------------------------------------------------------------:|:--------------------:|:-------------------:|:---------------------------:|:---------------------:| +| :-------------------------------------------------------------: | :------------------: | :-----------------: | :-------------------------: | :-------------------: | | [OpenAI (baseline)](https://github.com/openai/improved-gan) | Tensorflow | Pillow | Pillow Bicubic | **312.255 +/- 4.970** | | [StyleGAN-Ada](https://github.com/NVlabs/stylegan2-ada-pytorch) | Tero's Script Model | Pillow | Pillow Bicubic | 311.895 +/ 4.844 | | mmgen (Ours) | Pytorch Pretrained | cv2 | cv2 Bilinear | 322.932 +/- 2.317 | @@ -394,26 +417,29 @@ We also perform a survey on the influence of data loading pipeline and the versi
## PPL + Perceptual path length measures the difference between consecutive images (their VGG16 embeddings) when interpolating between two random inputs. Drastic changes mean that multiple features have changed together and that they might be entangled. Thus, a smaller PPL score appears to indicate higher overall image quality by experiments. \ As a basis for our metric, we use a perceptually-based pairwise image distance that is calculated as a weighted difference between two VGG16 embeddings, where the weights are fit so that the metric agrees with human perceptual similarity judgments. -If we subdivide a latent space interpolation path into linear segments, we can define the total perceptual length of this segmented path as the sum of perceptual differences over each segment, and a natural definition for the perceptual path length would be the limit of this sum under infinitely fine subdivision, but in practice we approximate it using a small subdivision ``$`\epsilon=10^{-4}`$``. +If we subdivide a latent space interpolation path into linear segments, we can define the total perceptual length of this segmented path as the sum of perceptual differences over each segment, and a natural definition for the perceptual path length would be the limit of this sum under infinitely fine subdivision, but in practice we approximate it using a small subdivision `` $`\epsilon=10^{-4}`$ ``. The average perceptual path length in latent `space` Z, over all possible endpoints, is therefore -``$$`L_Z = E[\frac{1}{\epsilon^2}d(G(slerp(z_1,z_2;t))), G(slerp(z_1,z_2;t+\epsilon)))]`$$`` +`` $$`L_Z = E[\frac{1}{\epsilon^2}d(G(slerp(z_1,z_2;t))), G(slerp(z_1,z_2;t+\epsilon)))]`$$ `` Computing the average perceptual path length in latent `space` W is carried out in a similar fashion: -``$$`L_Z = E[\frac{1}{\epsilon^2}d(G(slerp(z_1,z_2;t))), G(slerp(z_1,z_2;t+\epsilon)))]`$$`` +`` $$`L_Z = E[\frac{1}{\epsilon^2}d(G(slerp(z_1,z_2;t))), G(slerp(z_1,z_2;t+\epsilon)))]`$$ `` -Where ``$`z_1, z_2 \sim P(z)`$``, and ``$` t \sim U(0,1)`$`` if we set `sampling` to full, ``$` t \in \{0,1\}`$`` if we set `sampling` to end. ``$` G`$`` is the generator(i.e. ``$` g \circ f`$`` for style-based networks), and ``$` d(.,.)`$`` evaluates the perceptual distance between the resulting images.We compute the expectation by taking 100,000 samples (set `num_images` to 50,000 in our code). +Where `` $`z_1, z_2 \sim P(z)`$ ``, and `` $` t \sim U(0,1)`$ `` if we set `sampling` to full, `` $` t \in \{0,1\}`$ `` if we set `sampling` to end. `` $` G`$ `` is the generator(i.e. `` $` g \circ f`$ `` for style-based networks), and `` $` d(.,.)`$ `` evaluates the perceptual distance between the resulting images.We compute the expectation by taking 100,000 samples (set `num_images` to 50,000 in our code). You can find the complete implementation in `metrics.py`, which refers to https://github.com/rosinality/stylegan2-pytorch/blob/master/ppl.py. If you want to evaluate models with `PPL` metrics, you can add the `metrics` into your config file like this: + ```python # at the end of the configs/styleganv2/stylegan2_c2_ffhq_1024_b4x8.py metrics = dict( ppl_wend=dict(type='PPL', space='W', sampling='end', num_images=50000, image_shape=(3, 1024, 1024))) ``` + You can run the command below to calculate PPL. ```shell @@ -423,13 +449,16 @@ python tools/evaluation.py ./configs/styleganv2/stylegan2_c2_ffhq_1024_b4x8.py \ ``` ## SWD + Sliced Wasserstein distance is a discrepancy measure for probability distributions, and smaller distance indicates generated images look like the real ones. We obtain the Laplacian pyramids of every image and extract patches from the Laplacian pyramids as descriptors, then SWD can be calculated by taking the sliced Wasserstein distance of the real and fake descriptors. You can see the complete implementation in `metrics.py`, which refers to https://github.com/tkarras/progressive_growing_of_gans/blob/master/metrics/sliced_wasserstein.py. If you want to evaluate models with `SWD` metrics, you can add the `metrics` into your config file like this: + ```python # at the end of the configs/pggan/pggan_celeba-cropped_128_g8_12Mimgs.py metrics = dict(swd16k=dict(type='SWD', num_images=16384, image_shape=(3, 128, 128))) ``` + You can run the command below to calculate SWD. ```shell @@ -439,12 +468,15 @@ python tools/evaluation.py ./configs/pggan/pggan_celeba-cropped_128_g8_12Mimgs.p ``` ## MS-SSIM + Multi-scale structural similarity is used to measure the similarity of two images. We use MS-SSIM here to measure the diversity of generated images, and a low MS-SSIM score indicates the high diversity of generated images. You can see the complete implementation in `metrics.py`, which refers to https://github.com/tkarras/progressive_growing_of_gans/blob/master/metrics/ms_ssim.py. If you want to evaluate models with `MS-SSIM` metrics, you can add the `metrics` into your config file like this: + ```python # at the end of the configs/pggan/pggan_celeba-cropped_128_g8_12Mimgs.py metrics = dict(ms_ssim10k=dict(type='MS_SSIM', num_images=10000)) ``` + You can run the command below to calculate MS-SSIM. ```shell @@ -453,7 +485,6 @@ python tools/evaluation.py ./configs/pggan/pggan_celeba-cropped_128_g8_12Mimgs.p --batch-size 64 --online --eval ms_ssim10k ``` - # 5: Evaluation during training In this section, we will discuss how to evaluate the generative models, especially for GANs, in the training. Note that `MMGeneration` only supports distributed training and the evaluation metric adopted in the training procedure should also be run in a distributed style. Currently, only `FID` has been implemented and tested in an efficient distributed version. Other metrics with efficient distributed version will be supported in the recent future. Thus, in the following part, we will specify how to evaluate your models with `FID` metric in training. @@ -503,6 +534,7 @@ data = dict( We highly recommend that users should pre-calculate the inception pickle file in advance, which will reduce the evaluation cost significantly. We also provide `TranslationEvalHook` for users to evaluate translation models during training. The only difference with `GenerativeEvalHook` is that you need to specify the target domain of the evaluated model. For example, to evaluate the model with `FID` metric, please add the following python codes in your config file: + ```python evaluation = dict( type='TranslationEvalHook', diff --git a/docs/en/tutorials/applications.md b/docs/en/tutorials/applications.md index 49d8221d0..6c71df906 100644 --- a/docs/en/tutorials/applications.md +++ b/docs/en/tutorials/applications.md @@ -1,6 +1,7 @@ # Tutorial 8: Applications with Generative Models ## Interpolation + The generative model in the GAN architecture learns to map points in the latent space to generated images. The latent space has no meaning other than the meaning applied to it via the generative model. Generally, we want to explore the structure of latent space, one thing we can do is to interpolate a sequence of points between two endpoints in the latent space, and see the results these points yield. (Eg. we believe that features that are absent in either endpoint appear in the middle of a linear interpolation path is a sign that the latent space is entangled and the factors of variation are not properly separated.) Indeed, we have provided a application script to users. You can use [apps/interpolate_sample.py](https://github.com/open-mmlab/mmgeneration/tree/master/apps/interpolate_sample.py) with the following commands for unconditional models' interpolation: @@ -16,6 +17,7 @@ python apps/interpolate_sample.py \ [--samples-path ${SAMPLES_PATH}] \ [--batch-size ${BATCH_SIZE}] \ ``` + Here, we provide two kinds of `show-mode`, `sequence`, and `group`. In `sequence` mode, we sample a sequence of endpoints first, then interpolate points between two endpoints in order, generated images will be saved individually. In `group` mode, we sample several pairs of endpoints, then interpolate points between two endpoints in a pair, generated images will be saved in a single picture. What's more, `space` refers to the latent code space, you can choose 'z' or 'w' (especially refer to style space in StyleGAN series), `endpoint` indicates the number of endpoints you want to sample (should be set to even number in `group` mode), `interval` means the number of points (include endpoints) you interpolate between two endpoints. Note that more customized arguments are also offered to customizing your interpolating procedure. @@ -36,9 +38,11 @@ python apps/conditional_interpolate.py \ [--samples-path ${SAMPLES_PATH}] \ [--batch-size ${BATCH_SIZE}] \ ``` + Here, unlike unconditional models, you need to provide the name of the embedding layer if the label embedding is shared among conv_blocks. Otherwise, you can set the `embedding-name` to 'NULL'. Considering that conditional models have noise and label as inputs, we provide `fix-z` to fix the noise and `fix-y` to fix the label when performing image interpolation. ## Projection + Inverting the synthesis network g is an interesting problem that has many applications. For example, manipulating a given image in the latent feature space requires finding a matching latent code for it first. Generally, you can reconstruct a target image by optimizing over the latent vector, using lpips and pixel-wise loss as the objective function. Indeed, we have provided an application script to users to find the matching latent vector w of StyleGAN series synthesis network for given images. You can use [apps/stylegan_projector.py](https://github.com/open-mmlab/mmgeneration/tree/master/apps/stylegan_projector.py) with the following commands: @@ -50,11 +54,13 @@ python apps/stylegan_projector.py \ ${FILES} [--results-path ${RESULTS_PATH}] ``` + Here, `FILES` refer to the images' path, and the projection latent and reconstructed images will be saved in `results-path`. Note that more customized arguments are also offered to customizing your projection procedure. Please use `python apps/stylegan_projector.py --help` to check more details. ## Manipulation + A general application of StyleGAN based models is manipulating the latent space to control the attributes of the synthesized images. Here, we provide a simple but popular algorithm based on [SeFa](https://arxiv.org/pdf/2007.06600.pdf) to users. Of course, we modify the original version in calculating eigenvectors and offer a more flexible interface. To manipulate your generator, you can run the script [apps/modified_sefa.py](https://github.com/open-mmlab/mmgeneration/tree/master/apps/modified_sefa.py) with the following command: diff --git a/docs/en/tutorials/config.md b/docs/en/tutorials/config.md index 71de8d543..48d66f46a 100644 --- a/docs/en/tutorials/config.md +++ b/docs/en/tutorials/config.md @@ -21,7 +21,7 @@ When submitting jobs using "tools/train.py" or "tools/evaluation.py", you may sp - Update values of list/tuples. If the value to be updated is a list or a tuple. For example, the config file normally sets `workflow=[('train', 1)]`. If you want to - change this key, you may specify `--cfg-options workflow="[(train,1),(val,1)]"`. Note that the quotation mark \" is necessary to + change this key, you may specify `--cfg-options workflow="[(train,1),(val,1)]"`. Note that the quotation mark " is necessary to support list/tuple data types, and that **NO** white space is allowed inside the quotation marks in the specified value. ## Config File Structure @@ -55,7 +55,6 @@ We follow the below style to name config files. Contributors are advised to foll - `[batch_per_gpu x gpu]`: GPUs and samples per GPU, `b4x8` is used by default in stylegan2. - `{schedule}`: training schedule. Following Tero's convention, we recommend to use the number of images shown to the discriminator, like 5M, 800k. Of course, you can use 5e indicating 5 epochs or 80k-iters for 80k iterations. - ## An Example of StyleGAN2 To help the users have a basic idea of a complete config and the modules in a modern detection system, diff --git a/docs/en/tutorials/customize_dataset.md b/docs/en/tutorials/customize_dataset.md index 1f9b6b8cb..f1298155d 100644 --- a/docs/en/tutorials/customize_dataset.md +++ b/docs/en/tutorials/customize_dataset.md @@ -1,8 +1,8 @@ # Tutorial 2: Customize Datasets In this section, we will detail how to prepare data and adopt proper dataset in our repo for different methods. -## Datasets for unconditional models +## Datasets for unconditional models **Data preparation for unconditional model** is simple. What you need to do is downloading the images and put them into a directory. Next, you should set a symlink in the `data` directory. For standard unconditional gans with static architectures, like DCGAN and StyleGAN2, `UnconditionalImageDataset` is designed to train such unconditional models. Here is an example config for FFHQ dataset: @@ -39,6 +39,7 @@ data = dict( pipeline=train_pipeline))) ``` + Here, we adopt `RepeatDataset` to avoid frequent dataloader reloading, which will accelerate the training procedure. As shown in the example, `pipeline` provides important data pipeline to process images, including loading from file system, resizing, cropping and transferring to `torch.Tensor`. All of supported data pipelines can be found in `mmgen/datasets/pipelines`. For unconditional GANs with dynamic architectures like PGGAN and StyleGANv1, `GrowScaleImgDataset` is recommended to use for training. Since such dynamic architectures need real images in different scales, directly adopting `UnconditionalImageDataset` will bring heavy I/O cost for loading multiple high-resolution images. Here is an example we use for training PGGAN in CelebA-HQ dataset: @@ -90,6 +91,7 @@ data = dict( }, len_per_stage=300000)) ``` + In this dataset, you should provide a dictionary of image paths to the `imgs_roots`. Thus, you should resize the images in the dataset in advance. For the resizing methods in the data pre-processing, we adopt bilinear interpolation methods in all of the experiments studied in MMGeneration. Note that this dataset should be used with `PGGANFetchDataHook`. In this config file, this hook should be added in the customized hooks, as shown below. @@ -108,9 +110,11 @@ custom_hooks = [ priority='VERY_HIGH') ] ``` + This fetching data hook helps the dataloader update the status of dataset to change the data source and batch size during training. ## Datasets for image translation models + **Data preparation for translation model** needs a little attention. You should organize the files in the way we told you in `quick_run.md`. Fortunately, for most official datasets like facades and summer2winter_yosemite, they already have the right format. Also, you should set a symlink in the `data` directory. For paired-data trained translation model like Pix2Pix , `PairedImageDataset` is designed to train such translation models. Here is an example config for facades dataset: ```python @@ -275,4 +279,5 @@ data = dict( test_mode=True)) ``` + Here, `UnpairedImageDataset` will load both images (domain A and B) from different paths and transform them at the same time. diff --git a/docs/en/tutorials/customize_losses.md b/docs/en/tutorials/customize_losses.md index 3f5daf313..753e33c48 100644 --- a/docs/en/tutorials/customize_losses.md +++ b/docs/en/tutorials/customize_losses.md @@ -22,7 +22,6 @@ class DiscShiftLoss(nn.Module): # codes can be found in ``mmgen/models/losses/disc_auxiliary_loss.py`` ``` - The goal of this design for loss modules is to allow for using it automatically in the generative models (`MODELS`), without other complex codes to define the mapping between data and keyword arguments. Thus, different from other frameworks in `OpenMMLab`, our loss modules contain a special keyword, `data_info`, which is a dictionary defining the mapping between the input arguments and data from the generative models. Taking the `DiscShiftLoss` as an example, when writing the config file, users may use this loss as follows: ```python @@ -30,6 +29,7 @@ dict(type='DiscShiftLoss', loss_weight=0.001 * 0.5, data_info=dict(pred='disc_pred_real') ``` + The information in `data_info` tells the module to use the `disc_pred_real` data as the input tensor for `pred` arguments. Once the `data_info` is not `None`, our loss module will automatically build up the computational graph. ```python diff --git a/docs/en/tutorials/customize_models.md b/docs/en/tutorials/customize_models.md index f062735f1..8ec67f838 100644 --- a/docs/en/tutorials/customize_models.md +++ b/docs/en/tutorials/customize_models.md @@ -3,8 +3,8 @@ We basically categorize our supported models into 3 main streams according to tasks: - Unconditional GANs: - - Static architectures: DCGAN, StyleGANv2 - - Dynamic architectures: PGGAN, StyleGANv1 + - Static architectures: DCGAN, StyleGANv2 + - Dynamic architectures: PGGAN, StyleGANv1 - Image Translation Models: Pix2Pix, CycleGAN - Internal Learning (Single Image Model): SinGAN @@ -18,7 +18,6 @@ All of the other modules in `MMGeneration` will be registered as `MODULES`, incl In all of the related repos in OpenMMLab, users may follow the similar steps to build up a new components: - - Implement a class - Decorate the class with one of the register (`MODELS` or `MODULES` in our repo) - Import this component in related `__init__.py` files diff --git a/docs/en/tutorials/customize_runtime.md b/docs/en/tutorials/customize_runtime.md index 7e863a82a..9b0787f7a 100644 --- a/docs/en/tutorials/customize_runtime.md +++ b/docs/en/tutorials/customize_runtime.md @@ -41,8 +41,8 @@ To find the `MyOptimizer` module defined above, this module should be imported i - Modify `mmgen/core/optimizer/__init__.py` to import it. - The newly defined module should be imported in `mmgen/core/optimizer/__init__.py` so that the registry will - find the new module and add it: + The newly defined module should be imported in `mmgen/core/optimizer/__init__.py` so that the registry will + find the new module and add it: ```python from .my_optimizer import MyOptimizer @@ -106,34 +106,34 @@ The default optimizer constructor is implemented [here](https://github.com/open- Tricks not implemented by the optimizer should be implemented through optimizer constructor (e.g., set parameter-wise learning rates) or hooks. We list some common settings that could stabilize the training or accelerate the training. Feel free to create PR, issue for more settings. - __Use gradient clip to stabilize training__: - Some models need gradient clip to clip the gradients to stabilize the training process. An example is as below: + Some models need gradient clip to clip the gradients to stabilize the training process. An example is as below: - ```python - optimizer_config = dict( - _delete_=True, grad_clip=dict(max_norm=35, norm_type=2)) - ``` + ```python + optimizer_config = dict( + _delete_=True, grad_clip=dict(max_norm=35, norm_type=2)) + ``` - If your config inherits the base config which already sets the `optimizer_config`, you might need `_delete_=True` to override the unnecessary settings. See the [config documentation](https://mmgeneration.readthedocs.io/en/latest/config.html) for more details. + If your config inherits the base config which already sets the `optimizer_config`, you might need `_delete_=True` to override the unnecessary settings. See the [config documentation](https://mmgeneration.readthedocs.io/en/latest/config.html) for more details. - __Use momentum schedule to accelerate model convergence__: - We support momentum scheduler to modify model's momentum according to learning rate, which could make the model converge in a faster way. - Momentum scheduler is usually used with LR scheduler, for example, the following config is used in 3D detection to accelerate convergence. - For more details, please refer to the implementation of [CyclicLrUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L327) and [CyclicMomentumUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/momentum_updater.py#L130). - - ```python - lr_config = dict( - policy='cyclic', - target_ratio=(10, 1e-4), - cyclic_times=1, - step_ratio_up=0.4, - ) - momentum_config = dict( - policy='cyclic', - target_ratio=(0.85 / 0.95, 1), - cyclic_times=1, - step_ratio_up=0.4, - ) - ``` + We support momentum scheduler to modify model's momentum according to learning rate, which could make the model converge in a faster way. + Momentum scheduler is usually used with LR scheduler, for example, the following config is used in 3D detection to accelerate convergence. + For more details, please refer to the implementation of [CyclicLrUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L327) and [CyclicMomentumUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/momentum_updater.py#L130). + + ```python + lr_config = dict( + policy='cyclic', + target_ratio=(10, 1e-4), + cyclic_times=1, + step_ratio_up=0.4, + ) + momentum_config = dict( + policy='cyclic', + target_ratio=(0.85 / 0.95, 1), + cyclic_times=1, + step_ratio_up=0.4, + ) + ``` ## Customize training schedules @@ -142,20 +142,20 @@ We support many other learning rate schedules [here](https://github.com/open-mml - Poly schedule: - ```python - lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False) - ``` + ```python + lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False) + ``` - ConsineAnnealing schedule: - ```python - lr_config = dict( - policy='CosineAnnealing', - warmup='linear', - warmup_iters=1000, - warmup_ratio=1.0 / 10, - min_lr_ratio=1e-5) - ``` + ```python + lr_config = dict( + policy='CosineAnnealing', + warmup='linear', + warmup_iters=1000, + warmup_ratio=1.0 / 10, + min_lr_ratio=1e-5) + ``` ## Customize workflow @@ -229,8 +229,8 @@ Then we need to make `MyHook` imported. Assuming the file is in `mmgen/core/util - Modify `mmgen/core/utils/__init__.py` to import it. - The newly defined module should be imported in `mmgen/core/utils/__init__.py` so that the registry will - find the new module and add it: + The newly defined module should be imported in `mmgen/core/utils/__init__.py` so that the registry will + find the new module and add it: ```python from .my_hook import MyHook @@ -264,7 +264,6 @@ By default, the hook's priority is set as `NORMAL` during registration. If the hook is already implemented in MMCV, you can directly modify the config to use the hook as below - ### Modify default runtime hooks Some common hooks are not registered through `custom_hooks`, they are diff --git a/docs/en/tutorials/ddp_train_gans.md b/docs/en/tutorials/ddp_train_gans.md index 86ecd5f6c..275e569ce 100644 --- a/docs/en/tutorials/ddp_train_gans.md +++ b/docs/en/tutorials/ddp_train_gans.md @@ -34,6 +34,7 @@ if self.is_dynamic_ddp: kwargs.update(dict(ddp_reducer=self.model.reducer)) outputs = self.model.train_step(data_batch, self.optimizer, **kwargs) ``` + The reducer can help us to rebuild the bucket for current backward path by just adding this line in the `train_step` function: ```python @@ -54,6 +55,7 @@ if ddp_reducer is not None: loss_disc.backward() ``` + That is, users should add reducer preparation in between the loss calculation and loss backward. In our `MMGeneration`, this feature is adoptted as the default way to train DDP model. In configs, users should only add the following configuration to use dynamic ddp runner: @@ -68,8 +70,6 @@ runner = dict( *We have to admit that this implementation will use the private interface in PyTorch and will keep maintaining this feature.* - - ## DDP Wrapper Of course, we still support using the `DDP Wrapper` to train your GANs. If you want to switch to use DDP Wrapper, you should modify the config file like this: diff --git a/docs/en/tutorials/inception_stat.md b/docs/en/tutorials/inception_stat.md index 2a2530595..13893eb18 100644 --- a/docs/en/tutorials/inception_stat.md +++ b/docs/en/tutorials/inception_stat.md @@ -5,12 +5,13 @@ In MMGeneration, we provide a [script](https://github.com/open-mmlab/mmgeneratio - [Load images](#load-images) - - [Load from directory](#load-from-directory) - - [Load with dataset config](#load-with-dataset-config) + - [Load from directory](#load-from-directory) + - [Load with dataset config](#load-with-dataset-config) - [Define the version of Inception Net](#define-the-version-of-inception-net) - [Control number of images to calculate inception state](#control-number-of-images-to-calculate-inception-state) - [Control the shuffle operation in data loading](#control-the-shuffle-operation-in-data-loading) - [Note on inception state extraction between various code bases](#note-on-inception-state-extraction-between-various-code-bases) + ## Load Images @@ -20,10 +21,13 @@ We provide two ways to load real data, namely, pass the path of directory that c ### Load from Directory If you want to pass the path of real images, you can use `--imgsdir` arguments as the follow command. + ```shell python tools/utils/inception_stat.py --imgsdir ${IMGS_PATH} --pklname ${PKLNAME} --size ${SIZE} --flip ${FLIP} ``` + Then a pre-defined pipeline will be used to load images in `${IMGS_PATH}`. + ```python pipeline = [ dict(type='LoadImageFromFile', key='real_img'), @@ -40,16 +44,21 @@ pipeline = [ dict(type='ImageToTensor', keys=['real_img']) ] ``` + If `${FLIP}` is set as `True`, the following config of horizontal flip operation would be added to the end of the pipeline. + ```python dict(type='Flip', keys=['real_img'], direction='horizontal') ``` If you want to use a specific pipeline otherwise the pre-defined ones, you can use `--pipeline-cfg` to pass a config file contains the data pipeline you want to use. + ```shell python tools/utils/inception_stat.py --imgsdir ${IMGS_PATH} --pklname ${PKLNAME} --pipeline-cfg ${PIPELINE} ``` + To be noted that, the name of the pipeline dict in `${PIPELINE}` should be fixed as `inception_pipeline`. For example, + ```python # an example of ${PIPELINE} inception_pipeline = [ @@ -61,11 +70,13 @@ inception_pipeline = [ ### Load with Dataset Config If you want to use a dataset config, you can use `--data-config` arguments as the following command. + ```shell python tools/utils/inception_stat.py --data-config ${CONFIG} --pklname ${PKLNAME} --subset ${SUBSET} ``` Then a dataset will be instantiated following the `${SUBSET}` in the configs, and defaults to `test`. Take the following dataset config as example, + ```python # from `imagenet_128x128_inception_stat.py` data = dict( @@ -86,6 +97,7 @@ data = dict( ann_file='data/imagenet/meta/val.txt', pipeline=test_pipeline)) ``` + If not defined, the config in `data['test']` would be used in data loading process. If you want to extract the inception state of the training set, you can set `--subset train` in the command. Then the dataset would be built under the guidance of config in `data['train']` and images under `data/imagenet/train` and process pipeline of `train_pipeline` would be used. ## Define the Version of Inception Net @@ -120,6 +132,7 @@ python tools/utils/inception_stat.py --data-config ${CONFIG} --pklname ${PKLNAME For FID evaluation, differences between [PyTorch Studio GAN](https://github.com/POSTECH-CVLab/PyTorch-StudioGAN) and ours are mainly on the selection of real samples. In MMGen, we follow the pipeline of [BigGAN](https://github.com/ajbrock/BigGAN-PyTorch), where the whole training set is adopted to extract inception statistics. Besides, we also use [Tero's Inception](https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metrics/inception-2015-12-05.pt) for feature extraction. You can download the preprocessed inception state by the following url: + - [CIFAR10](https://download.openmmlab.com/mmgen/evaluation/fid_inception_pkl/cifar10.pkl) - [ImageNet1k](https://download.openmmlab.com/mmgen/evaluation/fid_inception_pkl/imagenet.pkl) - [ImageNet1k-64x64](https://download.openmmlab.com/mmgen/evaluation/fid_inception_pkl/imagenet_64x64.pkl) diff --git a/docs/zh_cn/get_started.md b/docs/zh_cn/get_started.md index d8ebc1784..acb47a596 100644 --- a/docs/zh_cn/get_started.md +++ b/docs/zh_cn/get_started.md @@ -17,78 +17,78 @@ ## 安装 -1. 创建conda虚拟环境并激活。 (这里假设新环境叫 ``open-mmlab``) +1. 创建conda虚拟环境并激活。 (这里假设新环境叫 `open-mmlab`) - ```shell - conda create -n open-mmlab python=3.7 -y - conda activate open-mmlab - ``` + ```shell + conda create -n open-mmlab python=3.7 -y + conda activate open-mmlab + ``` 2. 安装 PyTorch 和 torchvision,参考[官方安装指令](https://pytorch.org/),比如, - ```shell - conda install pytorch torchvision -c pytorch - ``` + ```shell + conda install pytorch torchvision -c pytorch + ``` - 注:确保您编译的CUDA版本和运行时CUDA版本相匹配。您可以在[PyTorch官网](https://pytorch.org/)检查预编译库支持的CUDA版本。 + 注:确保您编译的CUDA版本和运行时CUDA版本相匹配。您可以在[PyTorch官网](https://pytorch.org/)检查预编译库支持的CUDA版本。 - `示例1` 如果您在`/usr/local/cuda`下安装了 CUDA 10.1 并想要安装 - PyTorch 1.5,您需要安装支持CUDA 10.1的PyTorch预编译版本。 + `示例1` 如果您在`/usr/local/cuda`下安装了 CUDA 10.1 并想要安装 + PyTorch 1.5,您需要安装支持CUDA 10.1的PyTorch预编译版本。 - ```shell - conda install pytorch cudatoolkit=10.1 torchvision -c pytorch - ``` + ```shell + conda install pytorch cudatoolkit=10.1 torchvision -c pytorch + ``` - `示例2`如果您在`/usr/local/cuda`下安装了 CUDA 9.2 并想要安装 - PyTorch 1.5.1,您需要安装支持CUDA 9.2的PyTorch预编译版本。 + `示例2`如果您在`/usr/local/cuda`下安装了 CUDA 9.2 并想要安装 + PyTorch 1.5.1,您需要安装支持CUDA 9.2的PyTorch预编译版本。 - ```shell - conda install pytorch=1.5.1 cudatoolkit=9.2 torchvision=0.6.1 -c pytorch - ``` + ```shell + conda install pytorch=1.5.1 cudatoolkit=9.2 torchvision=0.6.1 -c pytorch + ``` - 如果您从源码编译PyTorch 而非安装预编译库, 您可以使用更多CUDA版本如9.0。 + 如果您从源码编译PyTorch 而非安装预编译库, 您可以使用更多CUDA版本如9.0。 3. 安装 mmcv-full, 我们建议您按照下述方法安装预编译库。 - ```shell - pip install mmcv-full={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html - ``` + ```shell + pip install mmcv-full={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html + ``` - 请替换链接中的 `{cu_version}` 和 `{torch_version}` 为您想要的版本。 比如, 要安装支持 `CUDA 11` 和 `PyTorch 1.7.0`的 `mmcv-full`, 使用下面命令: + 请替换链接中的 `{cu_version}` 和 `{torch_version}` 为您想要的版本。 比如, 要安装支持 `CUDA 11` 和 `PyTorch 1.7.0`的 `mmcv-full`, 使用下面命令: - ```shell - pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html - ``` + ```shell + pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html + ``` - 可在[这里](https://github.com/open-mmlab/mmcv#install-with-pip)查看兼容了不同PyTorch和CUDA的MMCV版本信息。 - 您也可以选择按照下方命令从源码编译mmcv + 可在[这里](https://github.com/open-mmlab/mmcv#install-with-pip)查看兼容了不同PyTorch和CUDA的MMCV版本信息。 + 您也可以选择按照下方命令从源码编译mmcv - ```shell - git clone https://github.com/open-mmlab/mmcv.git - cd mmcv - MMCV_WITH_OPS=1 pip install -e . # package mmcv-full will be installed after this step - cd .. - ``` + ```shell + git clone https://github.com/open-mmlab/mmcv.git + cd mmcv + MMCV_WITH_OPS=1 pip install -e . # package mmcv-full will be installed after this step + cd .. + ``` - 或者直接运行 + 或者直接运行 - ```shell - pip install mmcv-full - ``` + ```shell + pip install mmcv-full + ``` 4. 克隆MMGeneration仓库。 - ```shell - git clone https://github.com/open-mmlab/mmgeneration.git - cd mmgeneration - ``` + ```shell + git clone https://github.com/open-mmlab/mmgeneration.git + cd mmgeneration + ``` 5. 安装构建依赖项并安装MMGeneration。 - ```shell - pip install -r requirements.txt - pip install -v -e . # or "python setup.py develop" - ``` + ```shell + pip install -r requirements.txt + pip install -v -e . # or "python setup.py develop" + ``` 注: diff --git a/docs/zh_cn/modelzoo_statistics.md b/docs/zh_cn/modelzoo_statistics.md index 79eae5b98..0ceb52125 100644 --- a/docs/zh_cn/modelzoo_statistics.md +++ b/docs/zh_cn/modelzoo_statistics.md @@ -1,37 +1,27 @@ - # Model Zoo Statistics -* Number of papers: 11 -* Number of checkpoints: 62 - - * [CycleGAN: Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/cyclegan) (6 ckpts) - - - * [Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/dcgan) (3 ckpts) - - - * [Geometric GAN](https://github.com/open-mmlab/mmgeneration/blob/master/configs/ggan) (3 ckpts) - - - * [Least Squares Generative Adversarial Networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/lsgan) (4 ckpts) - - - * [Progressive Growing of GANs for Improved Quality, Stability, and Variation](https://github.com/open-mmlab/mmgeneration/blob/master/configs/pggan) (3 ckpts) +- Number of papers: 11 +- Number of checkpoints: 62 - * [Pix2Pix: Image-to-Image Translation with Conditional Adversarial Networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/pix2pix) (4 ckpts) + - [CycleGAN: Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/cyclegan) (6 ckpts) + - [Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/dcgan) (3 ckpts) - * [Positional Encoding as Spatial Inductive Bias in GANs (CVPR'2021)](https://github.com/open-mmlab/mmgeneration/blob/master/configs/positional_encoding_in_gans) (21 ckpts) + - [Geometric GAN](https://github.com/open-mmlab/mmgeneration/blob/master/configs/ggan) (3 ckpts) + - [Least Squares Generative Adversarial Networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/lsgan) (4 ckpts) - * [Singan: Learning a Generative Model from a Single Natural Image (ICCV'2019)](https://github.com/open-mmlab/mmgeneration/blob/master/configs/singan) (3 ckpts) + - [Progressive Growing of GANs for Improved Quality, Stability, and Variation](https://github.com/open-mmlab/mmgeneration/blob/master/configs/pggan) (3 ckpts) + - [Pix2Pix: Image-to-Image Translation with Conditional Adversarial Networks](https://github.com/open-mmlab/mmgeneration/blob/master/configs/pix2pix) (4 ckpts) - * [A Style-Based Generator Architecture for Generative Adversarial Networks (CVPR'2019)](https://github.com/open-mmlab/mmgeneration/blob/master/configs/styleganv1) (2 ckpts) + - [Positional Encoding as Spatial Inductive Bias in GANs (CVPR'2021)](https://github.com/open-mmlab/mmgeneration/blob/master/configs/positional_encoding_in_gans) (21 ckpts) + - [Singan: Learning a Generative Model from a Single Natural Image (ICCV'2019)](https://github.com/open-mmlab/mmgeneration/blob/master/configs/singan) (3 ckpts) - * [Analyzing and Improving the Image Quality of Stylegan (CVPR'2020)](https://github.com/open-mmlab/mmgeneration/blob/master/configs/styleganv2) (11 ckpts) + - [A Style-Based Generator Architecture for Generative Adversarial Networks (CVPR'2019)](https://github.com/open-mmlab/mmgeneration/blob/master/configs/styleganv1) (2 ckpts) + - [Analyzing and Improving the Image Quality of Stylegan (CVPR'2020)](https://github.com/open-mmlab/mmgeneration/blob/master/configs/styleganv2) (11 ckpts) - * [Improved Training of Wasserstein GANs](https://github.com/open-mmlab/mmgeneration/blob/master/configs/wgan-gp) (2 ckpts) + - [Improved Training of Wasserstein GANs](https://github.com/open-mmlab/mmgeneration/blob/master/configs/wgan-gp) (2 ckpts) diff --git a/docs/zh_cn/quick_run.md b/docs/zh_cn/quick_run.md index 9c04df55b..70d111665 100644 --- a/docs/zh_cn/quick_run.md +++ b/docs/zh_cn/quick_run.md @@ -1,5 +1,3 @@ # 1: 在标准的数据集上训练和推理现有的模型 - - ## 用现有的生成模型来生成图像 diff --git a/docs/zh_cn/tutorials/applications.md b/docs/zh_cn/tutorials/applications.md index eaa13f9ca..55a9f6bb1 100644 --- a/docs/zh_cn/tutorials/applications.md +++ b/docs/zh_cn/tutorials/applications.md @@ -1,6 +1,7 @@ # Tutorial 8: 生成模型的应用 ## 插值 + 以GAN为架构的生成模型学习将潜码空间中的点映射到生成的图像上。生成模型赋予了潜码空间的具体意义。一般来说,我们想探索潜码空间的结构,我们可以做的一件事是在潜码空间的两个端点之间插入一系列点,观察这些点生成的结果。(例如,我们认为,如果任何一个端点都不存在的特征出现在线性插值路径的中间点,则说明潜码空间是纠缠在一起的,动态属性没有得到适当的分离。) 我们为用户提供了一个应用脚本。你可以使用[apps/interpolate_sample.py](https://github.com/open-mmlab/mmgeneration/tree/master/apps/interpolate_sample.py)的以下命令进行无条件模型的插值。 @@ -16,6 +17,7 @@ python apps/interpolate_sample.py \ [--samples-path ${SAMPLES_PATH}] \ [--batch-size ${BATCH_SIZE}] \ ``` + 在这里,我们提供两种显示模式(SHOW_MODE),即序列(sequence)和组(group)。在序列模式下,我们首先对一连串的端点进行采样,然后按顺序对两个端点之间的点进行插值,生成的图像将被单独保存。在组模式下,我们先采样几对端点,然后在每对端点之间进行插值,生成的图像将被保存在一张图片中。此外,`space` 指的是潜码空间,你可以选择'z'或'w'(指StyleGAN系列中的风格空间),`endpoint` 表示你要采样的端点数量(在 `group` 模式中应设置为偶数),`interval`表示你在两个端点之间插值的点的数量(包括端点)。 注意,我们还提供了更多的自定义参数来定制你的插值程序。 @@ -40,6 +42,7 @@ python apps/conditional_interpolate.py \ 在这里,与无条件模型不同,如果标签嵌入在 `conv_blocks` 之间共享,你需要提供嵌入层的名称。否则,你应该将 `embedding-name` 设置为 `NULL`。考虑到条件模型有噪声和标签作为输入,我们提供 `fix-z` 来固定噪声,`fix-y` 来固定标签。 ## 投影 + 求生成网络 `g` 的逆是一个有趣的问题,有很多应用。例如,在潜码空间中操作一个给定的图像需要先为它找到一个匹配的潜码。一般来说,你可以通过对潜码进行优化来重建目标图像,使用 `lpips` 和像素级损失作为目标函数。 事实上,我们已经向用户提供了一个应用脚本,为给定的图像找到 `StyleGAN` 系列生成网络的匹配潜码向量w。你可以使用[apps/stylegan_projector.py](https://github.com/open-mmlab/mmgeneration/tree/master/apps/stylegan_projector.py)的以下命令来执行投影。 @@ -51,14 +54,15 @@ python apps/stylegan_projector.py \ ${FILES} [--results-path ${RESULTS_PATH}] ``` + 这里,`FILES` 指的是图像的路径,而投影的潜码和重建的图像将被保存在 `results-path` 中。 注意,我们还提供了更多的自定义参数来定制你的投影程序。请使用`python apps/stylegan_projector.py --help`来查看更多细节。 ## 编辑 -基于 StyleGAN 模型的一个常见应用是操纵潜码空间来控制合成图像的属性。在这里,我们向用户提供了一个基于[SeFa](https://arxiv.org/pdf/2007.06600.pdf)的简单而流行的算法。这里,我们在计算特征向量时对原始版本进行了修改,并提供了一个更灵活的接口。 -为了操纵你的生成器,你可以用以下命令运行脚本[apps/modified_sefa.py](https://github.com/open-mmlab/mmgeneration/tree/master/apps/modified_sefa.py)。 +基于 StyleGAN 模型的一个常见应用是操纵潜码空间来控制合成图像的属性。在这里,我们向用户提供了一个基于\[SeFa\](https://arxiv.org/pdf/2007.06600.pdf%EF%BC%89%E7%9A%84%E7%AE%80%E5%8D%95%E8%80%8C%E6%B5%81%E8%A1%8C%E7%9A%84%E7%AE%97%E6%B3%95%E3%80%82%E8%BF%99%E9%87%8C%EF%BC%8C%E6%88%91%E4%BB%AC%E5%9C%A8%E8%AE%A1%E7%AE%97%E7%89%B9%E5%BE%81%E5%90%91%E9%87%8F%E6%97%B6%E5%AF%B9%E5%8E%9F%E5%A7%8B%E7%89%88%E6%9C%AC%E8%BF%9B%E8%A1%8C%E4%BA%86%E4%BF%AE%E6%94%B9%EF%BC%8C%E5%B9%B6%E6%8F%90%E4%BE%9B%E4%BA%86%E4%B8%80%E4%B8%AA%E6%9B%B4%E7%81%B5%E6%B4%BB%E7%9A%84%E6%8E%A5%E5%8F%A3%E3%80%82 +为了操纵你的生成器,你可以用以下命令运行脚本[apps/modified_sefa.py](https://github.com/open-mmlab/mmgeneration/tree/master/apps/modified_sefa.py)。 ```shell python apps/modified_sefa.py --cfg ${CONFIG} --ckpt ${CKPT} \ @@ -66,6 +70,7 @@ python apps/modified_sefa.py --cfg ${CONFIG} --ckpt ${CKPT} \ -l ${LAYER_NO} \ [--eigen-vector ${PATH_EIGEN_VEC}] ``` + 在这个脚本中,如果 `eigen-vector` 为 `None`,程序将计算生成器参数的特征向量。同时,我们将把该向量保存在 `ckpt` 文件的同一目录下,这样用户就可以应用这个预先计算的向量。`Positional Encoding as Spatial Inductive Bias in GANs` 的演示就来自这个脚本。下面是一个例子,供用户获得与我们的演示类似的结果。 `${INDEX}`表示我们将应用哪个特征向量来操作图像。在一般情况下,每个索引控制一个独立的属性,这是由 `StyleGAN` 中的解耦表示保证的。我们建议用户可以尝试不同的索引来找到你想要的那个属性。`--degree` 设定了乘法因子的范围。在我们的实验中,我们观察到像 `[-3, 8]` 这样的非对称范围是非常有帮助的。因此,我们允许在这个参数中设置下限和上限。`--layer` 或`--l` 定义了我们将应用哪一层的特征向量。有些属性,比如光照,只与生成器中的 1-2 层有关。 @@ -79,4 +84,5 @@ python apps/modified_sefa.py \ -i 15 -d 8. --degree-step 0.5 -l 8 9 --sample-path ./work_dirs/sefa-exp/ \ --sample-cfg chosen_scale=4 randomize_noise=False ``` + 注意到,在设置 `chosen_scale=4` 之后,我们可以用一个简单的分辨率为256的生成器来操作512x512的图像。 diff --git a/docs/zh_cn/tutorials/ddp_train_gans.md b/docs/zh_cn/tutorials/ddp_train_gans.md index 55be3aedc..c940f967d 100644 --- a/docs/zh_cn/tutorials/ddp_train_gans.md +++ b/docs/zh_cn/tutorials/ddp_train_gans.md @@ -36,6 +36,7 @@ outputs = self.model.train_step(data_batch, self.optimizer, **kwargs) ``` 通过如下对 train_step 的修改,reducer 可以帮助我们在当前反传中重建桶: + ```python if ddp_reducer is not None: ddp_reducer.prepare_for_backward(_find_tensors(loss_disc)) @@ -54,6 +55,7 @@ if ddp_reducer is not None: loss_disc.backward() ``` + 也就是说,用户应该在损失计算和损失反传之间准备 reducer。 在我们的 `MMGeneration` 中,这个功能被作为训练 DDP 模型的默认方式。在配置文件中,用户只需要添加以下配置来使用动态 ddp runner。 @@ -68,8 +70,6 @@ runner = dict( *这个实现将使用 PyTorch 中的私有接口,我们将继续维护这一实现。* - - ## DDP Wrapper 当然,我们仍然支持使用 `DDP Wrapper` 来训练你的 GANs。如果你想切换到使用 DDP Wrapper,你应该这样修改配置文件。 diff --git a/mmgen/__init__.py b/mmgen/__init__.py index b746ec280..51fedc8b8 100644 --- a/mmgen/__init__.py +++ b/mmgen/__init__.py @@ -17,7 +17,7 @@ def digit_version(version_str): mmcv_minimum_version = '1.3.0' -mmcv_maximum_version = '1.5.0' +mmcv_maximum_version = '1.6.0' mmcv_version = digit_version(mmcv.__version__) diff --git a/requirements/mminstall.txt b/requirements/mminstall.txt index 8c4d4f0d2..9a0dc4ed7 100644 --- a/requirements/mminstall.txt +++ b/requirements/mminstall.txt @@ -1,2 +1,2 @@ -mmcls>=0.18.0,<=0.18.0 -mmcv-full>=1.3.0,<=1.5.0 +mmcls>=0.18.0 +mmcv-full>=1.3.0,<=1.6.0 diff --git a/requirements/runtime.txt b/requirements/runtime.txt index a6cc8e352..09fe26997 100644 --- a/requirements/runtime.txt +++ b/requirements/runtime.txt @@ -1,4 +1,4 @@ -mmcls==0.18.0 +mmcls ninja numpy prettytable