FEAT: add SD3 support (#1723)

xorbitsai · Jun 27, 2024 · 341e008 · 341e008
1 parent 66c66b7
commit 341e008
Show file tree

Hide file tree

Showing 10 changed files with 83 additions and 15 deletions.
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/image.po b/doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/image.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2024-02-01 16:47+0800\n"
+"POT-Creation-Date: 2024-06-26 12:25+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -17,7 +17,7 @@ msgstr ""
 "MIME-Version: 1.0\n"
 "Content-Type: text/plain; charset=utf-8\n"
 "Content-Transfer-Encoding: 8bit\n"
-"Generated-By: Babel 2.13.1\n"
+"Generated-By: Babel 2.14.0\n"
 
 #: ../../source/models/model_abilities/image.rst:5
 msgid "Images (Experimental)"
@@ -97,35 +97,60 @@ msgstr ""
 msgid "stable-diffusion-xl-base-1.0"
 msgstr ""
 
-#: ../../source/models/model_abilities/image.rst:46
+#: ../../source/models/model_abilities/image.rst:43
+msgid "sd3-medium"
+msgstr ""
+
+#: ../../source/models/model_abilities/image.rst:47
 msgid "Quickstart"
 msgstr "快速入门"
 
-#: ../../source/models/model_abilities/image.rst:49
+#: ../../source/models/model_abilities/image.rst:50
 msgid "Text-to-image"
 msgstr "文生图"
 
-#: ../../source/models/model_abilities/image.rst:51
+#: ../../source/models/model_abilities/image.rst:52
 msgid ""
 "The Text-to-image API mimics OpenAI's `create images API "
 "<https://platform.openai.com/docs/api-reference/images/create>`_. We can "
 "try Text-to-image API out either via cURL, OpenAI Client, or Xinference's"
 " python client:"
-msgstr "可以通过 cURL、OpenAI Client 或 Xinference 的方式尝试使用 Text-to-image API。"
+msgstr ""
+"可以通过 cURL、OpenAI Client 或 Xinference 的方式尝试使用 Text-to-image "
+"API。"
+
+#: ../../source/models/model_abilities/image.rst:108
+msgid ""
+"If you are running ``sd3-medium`` on a GPU less than 24GB and "
+"encountering out of memory, consider to add an extra param for launching "
+"according to `this article "
+"<https://huggingface.co/docs/diffusers/v0.29.1/en/api/pipelines/stable_diffusion/stable_diffusion_3"
+"#dropping-the-t5-text-encoder-during-inference>`_."
+msgstr ""
+"如果你在小于 24GB 的显卡上运行 ``sd3-medium`` 碰到内存不足的问题时，根据 "
+"`这篇文章 <https://huggingface.co/docs/diffusers/v0.29.1/en/api/"
+"pipelines/stable_diffusion/stable_diffusion_3#dropping-the-t5-text-"
+"encoder-during-inference>`_ 考虑在加载模型时增加额外选项。"
 
-#: ../../source/models/model_abilities/image.rst:107
+#: ../../source/models/model_abilities/image.rst:111
+msgid ""
+"xinference launch --model-name sd3-medium --model-type image "
+"--text_encoder_3 None"
+msgstr ""
+
+#: ../../source/models/model_abilities/image.rst:114
 msgid "Image-to-image"
 msgstr "图生图"
 
-#: ../../source/models/model_abilities/image.rst:109
+#: ../../source/models/model_abilities/image.rst:116
 msgid "You can find more examples of Images API in the tutorial notebook:"
 msgstr "你可以在教程笔记本中找到更多 Images API 的示例。"
 
-#: ../../source/models/model_abilities/image.rst:113
+#: ../../source/models/model_abilities/image.rst:120
 msgid "Stable Diffusion ControlNet"
 msgstr ""
 
-#: ../../source/models/model_abilities/image.rst:116
+#: ../../source/models/model_abilities/image.rst:123
 msgid "Learn from a Stable Diffusion ControlNet example"
 msgstr "学习一个 Stable Diffusion 控制网络的示例"
 
diff --git a/doc/source/models/builtin/image/index.rst b/doc/source/models/builtin/image/index.rst
@@ -13,6 +13,8 @@ The following is a list of built-in image models in Xinference:
 
    sd-turbo
 
+   sd3-medium
+
    sdxl-turbo
 
    stable-diffusion-v1.5

diff --git a/doc/source/models/builtin/image/sd3-medium.rst b/doc/source/models/builtin/image/sd3-medium.rst
@@ -0,0 +1,19 @@
+.. _models_builtin_sd3-medium:
+
+==========
+sd3-medium
+==========
+
+- **Model Name:** sd3-medium
+- **Model Family:** stable_diffusion
+- **Abilities:** text-to-image
+- **Available ControlNet:** None
+
+Specifications
+^^^^^^^^^^^^^^
+
+- **Model ID:** stabilityai/stable-diffusion-3-medium-diffusers
+
+Execute the following command to launch the model::
+
+   xinference launch --model-name sd3-medium --model-type image
diff --git a/doc/source/models/builtin/llm/glm-4v.rst b/doc/source/models/builtin/llm/glm-4v.rst
@@ -19,7 +19,7 @@ Model Spec 1 (pytorch, 9 Billion)
 
 - **Model Format:** pytorch
 - **Model Size (in billions):** 9
-- **Quantizations:** none
+- **Quantizations:** 4-bit, 8-bit, none
 - **Engines**: Transformers
 - **Model ID:** THUDM/glm-4v-9b
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/glm-4v-9b>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-4v-9b>`__

diff --git a/doc/source/models/builtin/llm/index.rst b/doc/source/models/builtin/llm/index.rst
@@ -397,7 +397,7 @@ The following is a list of built-in LLM in Xinference:
      - Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data.
 
    * - :ref:`qwen1.5-moe-chat <models_llm_qwen1.5-moe-chat>`
-     - chat
+     - chat, tools
      - 32768
      - Qwen1.5-MoE is a transformer-based MoE decoder-only language model pretrained on a large amount of data.
 
@@ -407,7 +407,7 @@ The following is a list of built-in LLM in Xinference:
      - Qwen2 is the new series of Qwen large language models
 
    * - :ref:`qwen2-moe-instruct <models_llm_qwen2-moe-instruct>`
-     - chat
+     - chat, tools
      - 32768
      - Qwen2 is the new series of Qwen large language models. 
 

diff --git a/doc/source/models/builtin/llm/qwen1.5-moe-chat.rst b/doc/source/models/builtin/llm/qwen1.5-moe-chat.rst
@@ -7,7 +7,7 @@ qwen1.5-moe-chat
 - **Context Length:** 32768
 - **Model Name:** qwen1.5-moe-chat
 - **Languages:** en, zh
-- **Abilities:** chat
+- **Abilities:** chat, tools
 - **Description:** Qwen1.5-MoE is a transformer-based MoE decoder-only language model pretrained on a large amount of data.
 
 Specifications

diff --git a/doc/source/models/builtin/llm/qwen2-moe-instruct.rst b/doc/source/models/builtin/llm/qwen2-moe-instruct.rst
@@ -7,7 +7,7 @@ qwen2-moe-instruct
 - **Context Length:** 32768
 - **Model Name:** qwen2-moe-instruct
 - **Languages:** en, zh
-- **Abilities:** chat
+- **Abilities:** chat, tools
 - **Description:** Qwen2 is the new series of Qwen large language models. 
 
 Specifications

diff --git a/doc/source/models/model_abilities/image.rst b/doc/source/models/model_abilities/image.rst
@@ -40,6 +40,7 @@ The Text-to-image API is supported with the following models in Xinference:
 * sdxl-turbo
 * stable-diffusion-v1.5
 * stable-diffusion-xl-base-1.0
+* sd3-medium
 
 
 Quickstart
@@ -102,6 +103,14 @@ We can try Text-to-image API out either via cURL, OpenAI Client, or Xinference's
     }
 
 
+.. note::
+
+  If you are running ``sd3-medium`` on a GPU less than 24GB and encountering out of memory,
+  consider to add an extra param for launching according to `this article <https://huggingface.co/docs/diffusers/v0.29.1/en/api/pipelines/stable_diffusion/stable_diffusion_3#dropping-the-t5-text-encoder-during-inference>`_.
+
+  .. code:: bash
+
+    xinference launch --model-name sd3-medium --model-type image --text_encoder_3 None
 
 Image-to-image
 --------------------

diff --git a/xinference/model/image/model_spec.json b/xinference/model/image/model_spec.json
@@ -1,4 +1,10 @@
 [
+  {
+    "model_name": "sd3-medium",
+    "model_family": "stable_diffusion",
+    "model_id": "stabilityai/stable-diffusion-3-medium-diffusers",
+    "model_revision": "ea42f8cef0f178587cf766dc8129abd379c90671"
+  },
   {
     "model_name": "sd-turbo",
     "model_family": "stable_diffusion",

diff --git a/xinference/model/image/model_spec_modelscope.json b/xinference/model/image/model_spec_modelscope.json
@@ -1,4 +1,11 @@
 [
+  {
+    "model_name": "sd3-medium",
+    "model_family": "stable_diffusion",
+    "model_hub": "modelscope",
+    "model_id": "AI-ModelScope/stable-diffusion-3-medium-diffusers",
+    "model_revision": "master"
+  },
   {
     "model_name": "sd-turbo",
     "model_family": "stable_diffusion",