Best Practices for Inference and Fine-Tuning with MiniCPM-V 2.6 #1613

Jintao-Huang · 2024-08-06T14:42:56Z

模型：https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6

通常，多模态大模型微调会使用自定义数据集进行微调。在这里，我们将展示可直接运行的demo。

在开始微调之前，请确保您的环境已准备妥当。

git clone https://github.com/modelscope/swift.git
cd swift
pip install -e .[llm]

模型推理

CUDA_VISIBLE_DEVICES=0 swift infer \
  --model_type minicpm-v-v2_6-chat \
  --model_id_or_path OpenBMB/MiniCPM-V-2_6

<<< 你好
你好！今天我能为您提供什么帮助？
--------------------------------------------------
<<< clear
<<< <image>描述这张图片
Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
这张图片展示了一只小猫的特写，它有着引人注目的外貌。小猫有着大大的、圆圆的、蓝色的眼睛，看起来充满了好奇和天真。它的毛色主要是白色，带有灰色和黑色的条纹，特别是在脸部和耳朵周围，这些地方的条纹更加明显。小猫的耳朵竖立着，尖尖的，内侧是粉红色的。它的胡须又长又白，从脸颊上伸出来。小猫的鼻子是粉红色的，嘴巴微微张开，露出一点粉红色的舌头。背景模糊，将焦点集中在小猫身上，暗示着一个室内环境，柔和的光线照亮了小猫的毛发。
--------------------------------------------------
<<< clear
<<< <video>描述这段视频
Input a video path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4
这段视频展示了一个年幼的孩子坐在床上，专心阅读一本书。孩子戴着深色眼镜，穿着浅蓝色无袖上衣和粉色裤子。床上铺着白色床单，孩子旁边放着一件白色衣物。背景中有一个木制婴儿床，暗示着一个家庭环境。房间光线柔和，氛围平静。视频中没有明显的动作或活动，孩子似乎完全沉浸在阅读中。

图片微调

我们使用 coco-en-mini 数据集进行微调，该数据集的任务是对图片内容进行描述。您可以在 modelscope 上找到该数据集：https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary

# 默认会将lora_target_modules设置为llm和resampler所有的linear
CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
  --model_type minicpm-v-v2_6-chat \
  --model_id_or_path OpenBMB/MiniCPM-V-2_6 \
  --sft_type lora \
  --dataset coco-en-mini#20000 \
  --deepspeed default-zero2

如果要使用自定义数据集，只需按以下方式进行指定：

  --dataset train.jsonl \
  --val_dataset val.jsonl \

自定义数据集支持json和jsonl样式，以下是自定义数据集的样例：

{"query": "<image>55555", "response": "66666", "images": ["image_path"]}
{"query": "eeeee<image>eeeee<image>eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response2"], ["query2", "response2"]], "images": []}

显存占用：

微调后推理脚本如下：

# 如果要全量测试请设置: `--show_dataset_sample -1`
CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir output/minicpm-v-v2_6-chat/vx-xxx/checkpoint-xxx \
    --load_dataset_config true --merge_lora true

微调后模型对验证集进行推理的示例（时间原因，只训练了300个step）：

视频微调

我们使用 video-chatgpt 数据集进行微调，该数据集的任务是对视频内容进行描述。您可以在 modelscope 上找到该数据集：https://modelscope.cn/datasets/swift/VideoChatGPT

CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
  --model_type minicpm-v-v2_6-chat \
  --model_id_or_path OpenBMB/MiniCPM-V-2_6 \
  --sft_type lora \
  --dataset video-chatgpt \
  --deepspeed default-zero2

自定义数据集支持json和jsonl样式，以下是自定义数据集的样例：

{"query": "<video>55555", "response": "66666", "videos": ["video_path"]}
{"query": "eeeee<video>eeeee<video>eeeee", "response": "fffff", "history": [], "videos": ["video_path1", "video_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response2"], ["query2", "response2"]], "videos": []}

显存占用：

微调后推理脚本如下：

CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir output/minicpm-v-v2_6-chat/vx-xxx/checkpoint-xxx \
    --load_dataset_config true --merge_lora true

微调后模型对验证集进行推理的示例（时间原因，只训练了50个step）：

The text was updated successfully, but these errors were encountered:

demoninpiano · 2024-08-07T15:16:29Z

官方文档的多图理解和in-context有在swift api里支持吗？

Jintao-Huang · 2024-08-07T15:34:43Z

支持多图和多轮的

多图需要使用多个标签即可. 可以查看上面的自定义数据集的格式

guihonghao · 2024-08-08T12:15:45Z

需要升级swift到什么版本啊？

Jintao-Huang · 2024-08-08T12:56:51Z

还在main分支

compleXuan · 2024-08-11T08:26:05Z

单样本视频推理的代码可以提供吗

Jintao-Huang · 2024-08-12T02:26:10Z

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_model_tokenizer, get_template, inference, ModelType,
    get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

model, tokenizer = get_model_tokenizer(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
                                       model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

query = '<video>描述这段视频'
videos = ['https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4']
response, history = inference(model, template, query, videos=videos)
print(f'query: {query}')
print(f'response: {response}')

# 流式（streaming）
query = '<image>描述这张图片'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
gen = inference_stream(model, template, query, images=images)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
    delta = response[print_idx:]
    print(delta, end='', flush=True)
    print_idx = len(response)
print()
"""
query: <video>描述这段视频
response: 这段视频展示了一个年幼的孩子，可能是一个蹒跚学步的幼儿，坐在床上专心阅读一本书。孩子戴着深色眼镜，穿着浅绿色无袖上衣和粉色裤子。床上铺着白色床单，背景中有一个木制婴儿床，暗示着一个家庭环境。房间光线充足，氛围温馨舒适。孩子专注的表情和姿势表明他们对书本内容很投入。
query: <image>描述这张图片
response: 这张图片展示了一只小猫的特写，它有着引人注目的面部特征。小猫的毛色主要是白色，带有灰色和黑色的条纹，特别是在眼睛周围和耳朵上。它的眼睛又大又圆，有着蓝色的虹膜，看起来非常好奇或专注。小猫的耳朵竖立着，内耳是粉红色的，与毛色形成对比。小猫的鼻子是粉红色的，有着小小的黑色鼻子，嘴巴微微张开，露出一点粉红色的舌头。小猫的胡须又长又白，从脸颊上伸出来。背景模糊，将焦点集中在小猫身上，暗示着一个室内环境，有自然光线，可能来自窗户。
"""

okideal · 2024-08-12T02:28:02Z

请问官方的few-shot推理方式 swift有支持么?

yingdachen · 2024-08-12T02:28:24Z

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_model_tokenizer, get_template, inference, ModelType,
    get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

model, tokenizer = get_model_tokenizer(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
                                       model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

query = '<video>描述这段视频'
videos = ['https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4']
response, history = inference(model, template, query, videos=videos)
print(f'query: {query}')
print(f'response: {response}')

# 流式（streaming）
query = '<image>描述这张图片'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
gen = inference_stream(model, template, query, images=images)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
    delta = response[print_idx:]
    print(delta, end='', flush=True)
    print_idx = len(response)
print()
"""
query: <video>描述这段视频
response: 这段视频展示了一个年幼的孩子，可能是一个蹒跚学步的幼儿，坐在床上专心阅读一本书。孩子戴着深色眼镜，穿着浅绿色无袖上衣和粉色裤子。床上铺着白色床单，背景中有一个木制婴儿床，暗示着一个家庭环境。房间光线充足，氛围温馨舒适。孩子专注的表情和姿势表明他们对书本内容很投入。
query: <image>描述这张图片
response: 这张图片展示了一只小猫的特写，它有着引人注目的面部特征。小猫的毛色主要是白色，带有灰色和黑色的条纹，特别是在眼睛周围和耳朵上。它的眼睛又大又圆，有着蓝色的虹膜，看起来非常好奇或专注。小猫的耳朵竖立着，内耳是粉红色的，与毛色形成对比。小猫的鼻子是粉红色的，有着小小的黑色鼻子，嘴巴微微张开，露出一点粉红色的舌头。小猫的胡须又长又白，从脸颊上伸出来。背景模糊，将焦点集中在小猫身上，暗示着一个室内环境，有自然光线，可能来自窗户。
"""

is this included in documentation somewhere...

Jintao-Huang · 2024-08-12T03:07:09Z

is this included in documentation somewhere...

Thank you for the excellent suggestions. We will update the document within this week.

Jintao-Huang · 2024-08-12T05:35:59Z

使用vllm：

pip install vllm>=0.5.4

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_vllm_engine, get_template, inference_vllm, ModelType,
    get_default_template_type, inference_stream_vllm
)
from swift.utils import seed_everything
import torch

model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

vllm_engine = get_vllm_engine(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
                              max_model_len=8192)
tokenizer = vllm_engine.hf_tokenizer
vllm_engine.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

query = '<image>描述这张图片'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
generation_info = {}
request_list = [{'query': query, 'images': images} for _ in range(100)]  # batch推理的示例
resp_list = inference_vllm(vllm_engine, template, request_list, generation_info=generation_info, use_tqdm=True)
print(f'query: {query}')
print(f'response: {resp_list[0]["response"]}')
print(generation_info)

# 流式（streaming）
generation_info = {}
gen = inference_stream_vllm(vllm_engine, template, request_list, generation_info=generation_info)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
# only show first
for resp_list in gen:
    resp = resp_list[0]
    if resp is None:
        continue
    response = resp['response']
    delta = response[print_idx:]
    print(delta, end='', flush=True)
    print_idx = len(response)
print()
print(generation_info)
"""
100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [00:01<00:00, 91.47it/s]
100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [00:22<00:00,  4.48it/s]
query: <image>描述这张图片
response: 这张图片展示了一只小猫咪的特写，可能是美国短毛猫品种，因为其花纹和毛发质地。猫咪有着引人注目的蓝色眼睛，这是其外貌中非常突出的特征。它皮毛上有着独特的黑色条纹，从面颊延伸至头顶，暗示着一种有条纹的花纹图案。它的耳朵小而尖，内侧是粉色的。猫咪的胡须细长而突出，围绕在它的下颌两侧和眼睛周围。猫咪坐着，用一种表达丰富的方式直视着，嘴巴微微张开，露出粉红色的内唇。背景模糊，柔和的光线增强了猫咪的特征。
{'num_prompt_tokens': 2700, 'num_generated_tokens': 14734, 'num_samples': 100, 'runtime': 23.53027338697575, 'samples/s': 4.249844375176322, 'tokens/s': 626.1720702384794}
query: <image>描述这张图片
response: 这张图片展示了一只小猫的特写，可能是一只幼年猫，在模糊的背景中，集中注意力在猫的表情上。这只猫长着一身白色与黑色条纹相间的毛皮，带有微妙的灰褐色。它的眼睛大而圆，具有高度的反光度，表明它们可能含有异色瞳，即一只眼睛是蓝色的，另一只是绿色的，但这只猫两只眼睛都是绿色的。睫毛清晰可见，增添了一种生动的表情。猫的耳朵竖立着，内部呈粉红色，边缘有浅色的阴影，显示出柔软的毛发。胡须又长又明显，突显了小猫的脸部形状。这个品种的猫看起来是一个常见品种，毛皮图案和眼睛颜色表明它可能是一只虎斑猫。光线柔和，产生一种天鹅绒般的效果，突出了猫绒毛的质感。
{'num_prompt_tokens': 2700, 'num_generated_tokens': 14986, 'num_samples': 100, 'runtime': 23.375922130944673, 'samples/s': 4.277906105257837, 'tokens/s': 641.0870089339394}
"""

samaritan1998 · 2024-08-12T07:52:14Z

微调minicpm-v-v2_6-chat出现报错:
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 491, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

微调其他模型是可以的，微调命令如下：

CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft
--model_type minicpm-v-v2_6-chat
--model_id_or_path OpenBMB/MiniCPM-V-2_6
--sft_type lora
--dataset **.jsonl
--deepspeed default-zero2 @Jintao-Huang

PancakeAwesome · 2024-08-12T10:05:39Z

请教一下，可以提供一下 Async+VLLM 推理 minicpmv2-6的代码么。

Jintao-Huang · 2024-08-12T10:28:53Z

请教一下，可以提供一下 Async+VLLM 推理 minicpmv2-6的代码么。

swift deploy 走的是 Async+VLLM的

客户端调用方式可以查看这里的文档：

https://swift.readthedocs.io/zh-cn/latest/Multi-Modal/vLLM%E6%8E%A8%E7%90%86%E5%8A%A0%E9%80%9F%E6%96%87%E6%A1%A3.html#id4

Jintao-Huang · 2024-08-12T10:29:34Z

CUDA_VISIBLE_DEVICES=0 swift deploy \
  --model_type minicpm-v-v2_6-chat \
  --model_id_or_path OpenBMB/MiniCPM-V-2_6 \
  --infer_backend vllm

PancakeAwesome · 2024-08-12T10:32:17Z

请教一下，可以提供一下 Async+VLLM 推理 minicpmv2-6的代码么。

swift deploy 走的是 Async+VLLM的

客户端调用方式可以查看这里的文档：

https://swift.readthedocs.io/zh-cn/latest/Multi-Modal/vLLM%E6%8E%A8%E7%90%86%E5%8A%A0%E9%80%9F%E6%96%87%E6%A1%A3.html#id4

这个文档显示的是 openai的客户端调用方法，openai 是同步调用吧？异步调用代码是不是得用 asyncio 包吧？

Jintao-Huang · 2024-08-12T11:58:58Z

服务端:

CUDA_VISIBLE_DEVICES=0 swift deploy --model_type minicpm-v-v2_6-chat --infer_backend vllm --max_model_len 8192

客户端：

import asyncio
from swift.llm import get_model_list_client, XRequestConfig, inference_client_async

model_list = get_model_list_client()
model_type = model_list.data[0].id
print(f'model_type: {model_type}')
request_config = XRequestConfig(seed=42)

query = '<image>Describe this image.'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
tasks = [inference_client_async(model_type, query, request_config=request_config) for _ in range(100)]
async def _batch_run(tasks):
    return await asyncio.gather(*tasks)

resp_list = asyncio.run(_batch_run(tasks))
print(f'query: {query}')
print(f'response0: {resp_list[0].choices[0].message.content}')
print(f'response1: {resp_list[1].choices[0].message.content}')

query = '<image>How many sheep are in the picture?'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png']

async def _stream():
    global query
    request_config = XRequestConfig(seed=42, stream=True)
    stream_resp = await inference_client_async(model_type, query, images=images, request_config=request_config)
    print(f'query: {query}')
    print('response: ', end='')
    async for chunk in stream_resp:
        print(chunk.choices[0].delta.content, end='', flush=True)
    print()

asyncio.run(_stream())
"""
query: <image>Describe this image.
response0: The video showcases a serene and picturesque landscape. The scene is dominated by a vast expanse of lush greenery, with a dense forest stretching out into the distance. The trees, varying in shades of green, create a vibrant tapestry that fills the frame. The forest appears to be thriving, with the sunlight filtering through the leaves and casting dappled shadows on the forest floor.

In the foreground, a small clearing is visible, providing a glimpse of the open sky above. The sky is a clear blue, with a few wispy clouds scattered across it, adding depth to the scene. The overall atmosphere of the video is tranquil and peaceful, with the natural beauty of the landscape taking center stage.

The video is likely shot during the day, as the lighting is bright and natural. The camera angle is slightly elevated, offering a panoramic view of the forest and the surrounding area. The focus is sharp, allowing for the intricate details of the trees and the forest floor to be clearly visible.

Overall, the video captures the essence of a peaceful forest, with its lush greenery, clear blue sky, and tranquil ambiance. It's a beautiful representation of nature's beauty, inviting viewers to appreciate the serenity and majesty of the natural world.
response1: The video showcases a serene and picturesque landscape. The scene is dominated by a vast expanse of lush greenery, with a dense forest stretching out into the distance. The trees, varying in shades of green, create a vibrant tapestry that fills the frame. The forest appears to be thriving, with the sunlight filtering through the leaves and casting dappled shadows on the forest floor.

In the foreground, a small clearing is visible, providing a glimpse of the open sky above. The sky is a clear blue, with a few wispy clouds scattered across it, adding depth to the scene. The overall atmosphere of the video is tranquil and peaceful, with the natural beauty of the landscape taking center stage.

The video is likely shot during the day, as the lighting is bright and natural. The camera angle is slightly elevated, offering a panoramic view of the forest and the surrounding area. The focus is sharp, allowing for the intricate details of the trees and the forest floor to be clearly visible.

Overall, the video captures the essence of a peaceful forest, with its lush greenery, clear blue sky, and tranquil ambiance. It's a beautiful representation of nature's beauty, inviting viewers to appreciate the serenity and majesty of the natural world.
query: <image>How many sheep are in the picture?
response: There are five sheep in the picture.
"""

PancakeAwesome · 2024-08-12T12:14:41Z

服务端:

CUDA_VISIBLE_DEVICES=0 swift deploy --model_type minicpm-v-v2_6-chat --infer_backend vllm --max_model_len 8192

客户端：

import asyncio
from swift.llm import get_model_list_client, XRequestConfig, inference_client_async

model_list = get_model_list_client()
model_type = model_list.data[0].id
print(f'model_type: {model_type}')
request_config = XRequestConfig(seed=42)

query = '<image>Describe this image.'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
tasks = [inference_client_async(model_type, query, request_config=request_config) for _ in range(100)]
async def _batch_run(tasks):
    return await asyncio.gather(*tasks)

resp_list = asyncio.run(_batch_run(tasks))
print(f'query: {query}')
print(f'response0: {resp_list[0].choices[0].message.content}')
print(f'response1: {resp_list[1].choices[0].message.content}')

query = '<image>How many sheep are in the picture?'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png']

async def _stream():
    global query
    request_config = XRequestConfig(seed=42, stream=True)
    stream_resp = await inference_client_async(model_type, query, images=images, request_config=request_config)
    print(f'query: {query}')
    print('response: ', end='')
    async for chunk in stream_resp:
        print(chunk.choices[0].delta.content, end='', flush=True)
    print()

asyncio.run(_stream())
"""
query: <image>Describe this image.
response0: The video showcases a serene and picturesque landscape. The scene is dominated by a vast expanse of lush greenery, with a dense forest stretching out into the distance. The trees, varying in shades of green, create a vibrant tapestry that fills the frame. The forest appears to be thriving, with the sunlight filtering through the leaves and casting dappled shadows on the forest floor.

In the foreground, a small clearing is visible, providing a glimpse of the open sky above. The sky is a clear blue, with a few wispy clouds scattered across it, adding depth to the scene. The overall atmosphere of the video is tranquil and peaceful, with the natural beauty of the landscape taking center stage.

The video is likely shot during the day, as the lighting is bright and natural. The camera angle is slightly elevated, offering a panoramic view of the forest and the surrounding area. The focus is sharp, allowing for the intricate details of the trees and the forest floor to be clearly visible.

Overall, the video captures the essence of a peaceful forest, with its lush greenery, clear blue sky, and tranquil ambiance. It's a beautiful representation of nature's beauty, inviting viewers to appreciate the serenity and majesty of the natural world.
response1: The video showcases a serene and picturesque landscape. The scene is dominated by a vast expanse of lush greenery, with a dense forest stretching out into the distance. The trees, varying in shades of green, create a vibrant tapestry that fills the frame. The forest appears to be thriving, with the sunlight filtering through the leaves and casting dappled shadows on the forest floor.

In the foreground, a small clearing is visible, providing a glimpse of the open sky above. The sky is a clear blue, with a few wispy clouds scattered across it, adding depth to the scene. The overall atmosphere of the video is tranquil and peaceful, with the natural beauty of the landscape taking center stage.

The video is likely shot during the day, as the lighting is bright and natural. The camera angle is slightly elevated, offering a panoramic view of the forest and the surrounding area. The focus is sharp, allowing for the intricate details of the trees and the forest floor to be clearly visible.

Overall, the video captures the essence of a peaceful forest, with its lush greenery, clear blue sky, and tranquil ambiance. It's a beautiful representation of nature's beauty, inviting viewers to appreciate the serenity and majesty of the natural world.
query: <image>How many sheep are in the picture?
response: There are five sheep in the picture.
"""

非常感谢你jintao-huang，

请问如何使用 python sdk启动服务呢
如何保障每次异步请求的每次结果都是不一样的呢，因为我看seed 都是一样的
相关其他多模态模型是否也是通用以上代码呢，比如 internvl2

Looking forward ur reply, Thank u!

Jintao-Huang · 2024-08-12T12:44:33Z

如何使用 python sdk启动服务

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import deploy_main, DeployArguments

# 与swift deploy相同的参数
deploy_main(DeployArguments(...))

保障每次异步请求的每次结果都是不一样

seed为None即可（默认）

相关其他多模态模型是否也是通用以上代码

是的

PancakeAwesome · 2024-08-12T12:55:43Z

如何使用 python sdk启动服务
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import deploy_main, DeployArguments

# 与swift deploy相同的参数
deploy_main(DeployArguments(...))
保障每次异步请求的每次结果都是不一样

seed为None即可（默认）

相关其他多模态模型是否也是通用以上代码

是的

我是否可以使用 get_vllm_engine 的接口方式，启动 vllm 服务呢？和 deploy_main 的方式有什么区别呢？

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_vllm_engine, get_template, inference_vllm, ModelType,
    get_default_template_type, inference_stream_vllm
)
from swift.utils import seed_everything
import torch

model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

vllm_engine = get_vllm_engine(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
                              max_model_len=8192)
tokenizer = vllm_engine.hf_tokenizer
vllm_engine.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

Jintao-Huang · 2024-08-12T13:20:45Z

minicpmv2-6 & vllm 开启服务要求安装flash-attn的问题已经修复

PancakeAwesome · 2024-08-12T15:38:27Z

如何使用 python sdk启动服务
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import deploy_main, DeployArguments

# 与swift deploy相同的参数
deploy_main(DeployArguments(...))
保障每次异步请求的每次结果都是不一样

seed为None即可（默认）

相关其他多模态模型是否也是通用以上代码

是的

我是否可以使用 get_vllm_engine 的接口方式，启动 vllm 服务呢？和 deploy_main 的方式有什么区别呢？

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_vllm_engine, get_template, inference_vllm, ModelType,
    get_default_template_type, inference_stream_vllm
)
from swift.utils import seed_everything
import torch

model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

vllm_engine = get_vllm_engine(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
                              max_model_len=8192)
tokenizer = vllm_engine.hf_tokenizer
vllm_engine.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

用 deploy_main sdk同样的 cli 参数会报错：

INFO: 2024-08-12 23:36:53,874 vllm_utils.py:567] generation_config: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.3, top_p=0.7, top_k=20, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=2048, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=False, spaces_between_special_tokens=True, truncate_prompt_tokens=None)
INFO: 2024-08-12 23:36:53,876 vllm_utils.py:578] system: You are a helpful assistant.
INFO:     Started server process [298157]
INFO:     Waiting for application startup.
Exception in thread Thread-7:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
INFO:     Application startup complete.
    self.run()
  File "/opt/conda/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/ossfs/workspace/ms-swift-main/swift/llm/deploy.py", line 70, in <lambda>
INFO:     Uvicorn running on http://127.0.0.1:8000/ (Press CTRL+C to quit)
    thread = Thread(target=lambda: asyncio.run(_log_stats_hook(_args.log_interval)))
  File "/opt/conda/lib/python3.8/site-packages/nest_asyncio.py", line 27, in run
    loop = asyncio.get_event_loop()
  File "/opt/conda/lib/python3.8/asyncio/events.py", line 639, in get_event_loop
    raise RuntimeError('There is no current event loop in thread %r.'
RuntimeError: There is no current event loop in thread 'Thread-7'.
/opt/conda/lib/python3.8/threading.py:934: RuntimeWarning: coroutine '_log_stats_hook' was never awaited
  self._invoke_excepthook(self)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

PancakeAwesome · 2024-08-12T15:49:28Z

如何使用 python sdk启动服务
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import deploy_main, DeployArguments

# 与swift deploy相同的参数
deploy_main(DeployArguments(...))
保障每次异步请求的每次结果都是不一样

seed为None即可（默认）

相关其他多模态模型是否也是通用以上代码

是的

我是否可以使用 get_vllm_engine 的接口方式，启动 vllm 服务呢？和 deploy_main 的方式有什么区别呢？

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_vllm_engine, get_template, inference_vllm, ModelType,
    get_default_template_type, inference_stream_vllm
)
from swift.utils import seed_everything
import torch

model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

vllm_engine = get_vllm_engine(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
                              max_model_len=8192)
tokenizer = vllm_engine.hf_tokenizer
vllm_engine.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

楼主已经修复值最新 main 分支，pip install -e '.[all]'
区别在于，python sdk get_vllm_engine 开启的服务，不能用异步调用；而 CLI 开启的 vllm 服务默认是 Async 服务，可以异步调用

PancakeAwesome · 2024-08-12T16:32:55Z

请教一下 VLLM+异步客户端调用支持官方的Fewshot 功能么？fewshot 功能如下：
来自：https://huggingface.co/openbmb/MiniCPM-V-2_6#in-context-few-shot-learning

import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)

question = "production date" 
image1 = Image.open('example1.jpg').convert('RGB')
answer1 = "2023.08.04"
image2 = Image.open('example2.jpg').convert('RGB')
answer2 = "2007.04.24"
image_test = Image.open('test.jpg').convert('RGB')

msgs = [
    {'role': 'user', 'content': [image1, question]}, {'role': 'assistant', 'content': [answer1]},
    {'role': 'user', 'content': [image2, question]}, {'role': 'assistant', 'content': [answer2]},
    {'role': 'user', 'content': [image_test, question]}
]

answer = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer
)
print(answer)

Jintao-Huang · 2024-08-12T16:33:45Z

支持的, 这个就是多轮对话

pramanik2289 · 2024-08-26T04:46:05Z

How to evaluate with custom dataset(test video data) its throwing error ?
raise APIConnectionError(request=request) from err
openai.APIConnectionError: Connection error.

I am creating UI using flask but getting error - NotImplementedError: Cannot copy out of meta tensor; no data!- any reason

@yingdachen, any input ???

Jintao-Huang · 2024-08-26T05:19:47Z

How to evaluate with custom dataset(test video data) its throwing error ? raise APIConnectionError(request=request) from err openai.APIConnectionError: Connection error.

I am creating UI using flask but getting error - NotImplementedError: Cannot copy out of meta tensor; no data!- any reason

@yingdachen, any input ???

This error indicates insufficient GPU memory.

pramanik2289 · 2024-08-26T05:51:35Z

@yingdachen for which error I am getting two error !!

zhaoyangwei123 · 2024-08-29T05:44:19Z

利用zero3微调MiniCPM-V2.6报错，只是将图片微调的命令从zero2改成了默认的zero3，就出现了报错：
Traceback (most recent call last):
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/cli/sft.py", line 5, in
sft_main()
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/utils/run_utils.py", line 32, in x_main
result = llm_x(args, **kwargs)
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/llm/sft.py", line 417, in llm_sft
trainer.train(training_args.resume_from_checkpoint)
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/trainers/mixin.py", line 552, in train
res = super().train(resume_from_checkpoint, *args, **kwargs)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/transformers/trainer.py", line 2015, in _inner_training_loop
model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/accelerate/accelerator.py", line 1284, in prepare
result = self._prepare_deepspeed(*args)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/accelerate/accelerator.py", line 1751, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/llm/utils/template.py", line 337, in _initialize
res = _old_initialize(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/init.py", line 179, in initialize
config_class = DeepSpeedConfig(config, mpu, mesh_device=mesh_device)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/config.py", line 797, in init
self._initialize_params(copy.copy(self._param_dict))
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/config.py", line 817, in _initialize_params
self.zero_config = get_zero_config(param_dict)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/zero/config.py", line 71, in get_zero_config
return DeepSpeedZeroConfig(**zero_config_dict)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/config_utils.py", line 57, in init
super().init(**data)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/pydantic/main.py", line 193, in init
self.pydantic_validator.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for DeepSpeedZeroConfig
stage3_prefetch_bucket_size
Input should be a valid integer, got a number with a fractional part [type=int_from_float, input_value=11560550.4, input_type=float]
For further information visit https://errors.pydantic.dev/2.8/v/int_from_float
Traceback (most recent call last):
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/cli/sft.py", line 5, in
sft_main()
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/utils/run_utils.py", line 32, in x_main
result = llm_x(args, **kwargs)
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/llm/sft.py", line 417, in llm_sft
trainer.train(training_args.resume_from_checkpoint)
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/trainers/mixin.py", line 552, in train
res = super().train(resume_from_checkpoint, *args, **kwargs)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/transformers/trainer.py", line 2015, in _inner_training_loop
model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/accelerate/accelerator.py", line 1284, in prepare
result = self._prepare_deepspeed(*args)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/accelerate/accelerator.py", line 1751, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/llm/utils/template.py", line 337, in _initialize
res = _old_initialize(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/init.py", line 179, in initialize
config_class = DeepSpeedConfig(config, mpu, mesh_device=mesh_device)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/config.py", line 797, in init
self._initialize_params(copy.copy(self._param_dict))
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/config.py", line 817, in _initialize_params
self.zero_config = get_zero_config(param_dict)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/zero/config.py", line 71, in get_zero_config
return DeepSpeedZeroConfig(**zero_config_dict)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/config_utils.py", line 57, in init
super().init(**data)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/pydantic/main.py", line 193, in init
self.pydantic_validator.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for DeepSpeedZeroConfig
stage3_prefetch_bucket_size
Input should be a valid integer, got a number with a fractional part [type=int_from_float, input_value=11560550.4, input_type=float]
For further information visit https://errors.pydantic.dev/2.8/v/int_from_float
请问在swift中微调MiniCPM-V2.6只能使用zero2吗？，我使用的机器是4张3090

Jintao-Huang · 2024-08-29T05:47:38Z

deepspeed版本调整一下

zhaoyangwei123 · 2024-08-30T04:35:01Z

deepspeed版本调整一下

您好，由于GPU显存不够所以想尝试用int4的模型取finetune，但是我看swift得官方文档里面没有说支持minicpm的int4模型微调https://github.com/modelscope/ms-swift/blob/main/docs/source/LLM/%E6%94%AF%E6%8C%81%E7%9A%84%E6%A8%A1%E5%9E%8B%E5%92%8C%E6%95%B0%E6%8D%AE%E9%9B%86.md#%E6%A8%A1%E5%9E%8B
我直接指定--model_id_or_path OpenBMB/MiniCPM-V-2_6-int4然后在--sft_type lora 后面增加--quantization_bit 4，但是model_type仍然是minicpm-v-v2_6-chat发现也可以训练，请问这样的设置是对的吗，还是说后续您会专门为int4模型增加相应的model_type呢？

learn01one · 2024-09-03T06:53:03Z

偏好数据训练时，格式需要怎样的，其他模型可以用的，训练这个模型报错

ztianlin · 2024-09-09T06:46:25Z

请问微调完成后怎么获得用于部署的gguf模型呢

yingdachen · 2024-09-09T07:19:00Z

请问微调完成后怎么获得用于部署的gguf模型呢

minicpm转gguf的流程需要一些定制化操作，可以参考minicmp的官方文档：
https://modelbest.feishu.cn/wiki/LZxLwp4Lzi29vXklYLFchwN5nCf

qgl1818 · 2024-09-13T07:43:07Z

CUDA_VISIBLE_DEVICES=0 swift sft --model_type minicpm-v-v2_6-chat --model_id_or_path OpenBMB/MiniCPM-V-2_6 --sft_type lora --dataset /data --deepspeed zero3-offload --output_dir output --num_train_epoch 5
没有报错但是进程直接停止了

[INFO:swift] Downloading the model from ModelScope Hub, model_id: OpenBMB/MiniCPM-V-2_6
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /home/shy/.cache/modelscope/hub/OpenBMB/MiniCPM-V-2_6
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO:swift] model_kwargs: {'device_map': None}
[2024-09-13 15:40:49,803] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-09-13 15:40:50,825] [INFO] [config.py:733:init] Config mesh_device None world_size = 1
[2024-09-13 15:40:50,826] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-09-13 15:40:50,826] [INFO] [comm.py:667:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2024-09-13 15:40:51,245] [INFO] [comm.py:717:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=192.168.31.119, master_port=29500
[2024-09-13 15:40:51,245] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl

zhaoyangwei123 · 2024-09-13T07:45:50Z

CUDA_VISIBLE_DEVICES=0 swift sft --model_type minicpm-v-v2_6-chat --model_id_or_path OpenBMB/MiniCPM-V-2_6 --sft_type lora --dataset /data --deepspeed zero3-offload --output_dir output --num_train_epoch 5 没有报错但是进程直接停止了

[INFO:swift] Downloading the model from ModelScope Hub, model_id: OpenBMB/MiniCPM-V-2_6
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /home/shy/.cache/modelscope/hub/OpenBMB/MiniCPM-V-2_6
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO:swift] model_kwargs: {'device_map': None}
[2024-09-13 15:40:49,803] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-09-13 15:40:50,825] [INFO] [config.py:733:init] Config mesh_device None world_size = 1
[2024-09-13 15:40:50,826] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-09-13 15:40:50,826] [INFO] [comm.py:667:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2024-09-13 15:40:51,245] [INFO] [comm.py:717:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=192.168.31.119, master_port=29500
[2024-09-13 15:40:51,245] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl

我也遇到过这个问题，用--deepspeed zero3-offload需要很大的内存，一般卡住不动都是因为机器内存满了导致的

qgl1818 · 2024-09-13T07:47:31Z

CUDA_VISIBLE_DEVICES=0 swift sft --model_type minicpm-v-v2_6-chat --model_id_or_path OpenBMB/MiniCPM-V-2_6 --sft_type lora --dataset /data --deepspeed zero3-offload --output_dir output --num_train_epoch 5 没有报错但是进程直接停止了

[INFO:swift] Downloading the model from ModelScope Hub, model_id: OpenBMB/MiniCPM-V-2_6
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /home/shy/.cache/modelscope/hub/OpenBMB/MiniCPM-V-2_6
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO:swift] model_kwargs: {'device_map': None}
[2024-09-13 15:40:49,803] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-09-13 15:40:50,825] [INFO] [config.py:733:init] Config mesh_device None world_size = 1
[2024-09-13 15:40:50,826] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-09-13 15:40:50,826] [INFO] [comm.py:667:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2024-09-13 15:40:51,245] [INFO] [comm.py:717:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=192.168.31.119, master_port=29500
[2024-09-13 15:40:51,245] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl

我也遇到过这个问题，用--deepspeed zero3-offload需要很大的内存，一般卡住不动都是因为机器内存满了导致的

deepspeed不能用虚拟内存是吗？
我是进程很快就停止运行了

zhaoyangwei123 · 2024-09-13T08:09:59Z

CUDA_VISIBLE_DEVICES=0 swift sft --model_type minicpm-v-v2_6-chat --model_id_or_path OpenBMB/MiniCPM-V-2_6 --sft_type lora --dataset /data --deepspeed zero3-offload --output_dir output --num_train_epoch 5 没有报错但是进程直接停止了

[INFO:swift] Downloading the model from ModelScope Hub, model_id: OpenBMB/MiniCPM-V-2_6
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /home/shy/.cache/modelscope/hub/OpenBMB/MiniCPM-V-2_6
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO:swift] model_kwargs: {'device_map': None}
[2024-09-13 15:40:49,803] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-09-13 15:40:50,825] [INFO] [config.py:733:init] Config mesh_device None world_size = 1
[2024-09-13 15:40:50,826] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-09-13 15:40:50,826] [INFO] [comm.py:667:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2024-09-13 15:40:51,245] [INFO] [comm.py:717:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=192.168.31.119, master_port=29500
[2024-09-13 15:40:51,245] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl

我也遇到过这个问题，用--deepspeed zero3-offload需要很大的内存，一般卡住不动都是因为机器内存满了导致的

deepspeed不能用虚拟内存是吗？我是进程很快就停止运行了

可以用虚拟内存，我也是用了的，但是还是会卡住，我是在train出来之后卡住的，后来多加了几根内存条就好了

liangyi-qianwan · 2024-09-13T08:20:29Z

我们使用
CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft
--model_type minicpm-v-v2_6-chat
--model_id_or_path OpenBMB/MiniCPM-V-2_6
--sft_type lora
--dataset coco-en-mini#20000
--deepspeed default-zero2
对模型进行微调后，使用
model = AutoModel.from_pretrained(model_path, trust_remote_code=True)
model = PeftModel.from_pretrained(model, lora_path)
部署模型时，报错：Target module Qwen2ForCauselLM() is not supported. currently, only the followingrmodules are supported: 'torch.nn.Linear'……。请问该如何解决那？

qxzheng · 2024-09-15T02:30:00Z

请教一下，用swift在进行视频微调的时候是不是也是通过抽帧实现的？这个抽帧率默认用的多少，能修改吗？在命令行里没找到这个参数，试着用了下sample_n_frames.显示不支持。因为现在用图片和视频微调，就得算下图片的配比，需要知道抽帧率，谢谢。

pavanvenkatsai · 2024-09-28T22:44:08Z

I Successfully finetuned OpenBMB/MiniCPM-V-2_6 model using custom dataset,

CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft
--model_type minicpm-v-v2_6-chat
--model_id_or_path OpenBMB/MiniCPM-V-2_6
--sft_type lora
--deepspeed default-zero2
--dataset train.jsonl
--val_dataset val.jsonl

An my checkpoint is also created....

I used the below format for both dataset and val_dataset:
{"query": "55555", "response": "66666", "images": ["image_path"]}
{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response1"], ["query2", "response2"]], "images": ["image_path"]}

I want to infer the fine tuned model by passing my own prompt and image url as an input,
i did

Can someone help me with,

How to use and infer the fine tuned model (checkpoint-xxx-merged) using my input prompt and image url.
How to deploy the fine tuned model using vllm or lmdeploy
I am not understanding the evaluation part, can you please share me the evaluation data set format.
How to use eval dataset and test the accuracy of the model.

Some one please help me...

Jintao-Huang · 2024-09-29T02:35:42Z

我们使用 CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft --model_type minicpm-v-v2_6-chat --model_id_or_path OpenBMB/MiniCPM-V-2_6 --sft_type lora --dataset coco-en-mini#20000 --deepspeed default-zero2 对模型进行微调后，使用 model = AutoModel.from_pretrained(model_path, trust_remote_code=True) model = PeftModel.from_pretrained(model, lora_path) 部署模型时，报错：Target module Qwen2ForCauselLM() is not supported. currently, only the followingrmodules are supported: 'torch.nn.Linear'……。请问该如何解决那？

升级一下ms-swift

Jintao-Huang · 2024-09-29T02:39:43Z

请教一下，用swift在进行视频微调的时候是不是也是通过抽帧实现的？这个抽帧率默认用的多少，能修改吗？在命令行里没找到这个参数，试着用了下sample_n_frames.显示不支持。因为现在用图片和视频微调，就得算下图片的配比，需要知道抽帧率，谢谢。

ms-swift/swift/llm/utils/template.py

Line 3346 in 6330c70

    
           load_video = partial(load_video_minicpmv_mplug_owl3, max_num_frames=max_num_frames)

MAX_NUM_FRAMES

Jintao-Huang · 2024-09-29T02:41:29Z

I Successfully finetuned OpenBMB/MiniCPM-V-2_6 model using custom dataset,

CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft --model_type minicpm-v-v2_6-chat --model_id_or_path OpenBMB/MiniCPM-V-2_6 --sft_type lora --deepspeed default-zero2 --dataset train.jsonl --val_dataset val.jsonl

An my checkpoint is also created....

I used the below format for both dataset and val_dataset: {"query": "55555", "response": "66666", "images": ["image_path"]} {"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]} {"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response1"], ["query2", "response2"]], "images": ["image_path"]}

I want to infer the fine tuned model by passing my own prompt and image url as an input, i did

Can someone help me with,

How to use and infer the fine tuned model (checkpoint-xxx-merged) using my input prompt and image url.

How to deploy the fine tuned model using vllm or lmdeploy

I am not understanding the evaluation part, can you please share me the evaluation data set format.

How to use eval dataset and test the accuracy of the model.

Some one please help me...

https://swift.readthedocs.io/en/latest/Multi-Modal/index.html

vjaideep08 · 2024-09-30T08:31:18Z

Hi I finetuned MiniCPM-V 2.6 model using #1613.

And deployed the merged model using CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir output/minicpm-v-v2_5-chat/vx-xxx/checkpoint-xxx-merged

when trying to call the post api, it is not responding

INFO: 2024-09-30 08:25:57,729 deploy.py:157] {'request_id': 'chatcmpl-f515986bf3d24c9e9b66f6a83d48a0eb', 'model': 'minicpm-v-v2_6-chat', 'messages': [{'role': 'user', 'content': 'Describe this image.'}], 'generation_config': GenerationConfig({'bos_token_id': 151643, 'eos_token_id': 151645, 'max_new_tokens': 32410, 'pad_token_id': 151643, 'return_dict_in_generate': True}), 'seed': None, 'stop': [], 'stream': False}
Starting from v4.46, the logits model output will have the same type as the model (except at train time, where it will always be FP32)
INFO: 2024-09-30 08:26:04,559 deploy.py:56] {'num_prompt_tokens': 0, 'num_generated_tokens': 0, 'num_samples': 0, 'runtime': 10.00989026, 'samples/s': 0.0, 'tokens/s': 0.0}

I can see the hit is made in the terminal logs but no response can be found on postman

vjaideep08 · 2024-09-30T08:33:10Z

What is the format of eval dataset.

How to validate the eval dataset and what is the meaning of label key in result dataset.

Should the response key in the eval data should be empty?

middleflames · 2024-09-30T13:11:04Z

What is the format of eval dataset.

How to validate the eval dataset and what is the meaning of label key in result dataset.

Should the response key in the eval data should be empty?

Same issue here. My inference dataset is the same foramt with my finetuning dataset, but the inference cli doesn't work

NirHeaven · 2024-10-22T12:22:19Z

A800上使用
CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft
--model_type minicpm-v-v2_6-chat
--model_id_or_path $MODEL
--sft_type lora
--dataset coco-en-mini#20000
--deepspeed default-zero2

会直接卡死

2013358072 · 2024-10-28T02:29:32Z

eval 数据集的格式是什么。

如何验证 eval 数据集以及 result dataset 中 label key 的含义是什么。

eval 数据中的响应键应该为空吗？

the same question

chen10089 · 2024-11-11T07:36:43Z

如何使用sft后的模型进行推理？
CUDA_VISIBLE_DEVICES=1 swift infer
--model_type minicpm-v-v2_6-chat
--model_id_or_path OpenBMB/MiniCPM-V-2_6
--ckpt_dir output/minicpm-v-v2_6-chat/v1-20241111-150624/checkpoint-10
是这样设置吗

chen10089 · 2024-11-11T07:55:45Z

如何使用sft后的模型进行推理？ CUDA_VISIBLE_DEVICES=1 swift infer --model_type minicpm-v-v2_6-chat --model_id_or_path OpenBMB/MiniCPM-V-2_6 --ckpt_dir output/minicpm-v-v2_6-chat/v1-20241111-150624/checkpoint-10 是这样设置吗

参考了https://swift.readthedocs.io/en/latest/Multi-Modal/minicpm-v-best-practice.html，原来是这样设置：
CUDA_VISIBLE_DEVICES=1 swift infer
--ckpt_dir output/minicpm-v-v2_6-chat/v1-20241111-150624/checkpoint-10-merged

zhaoyangwei123 · 2024-11-13T03:35:20Z

@Jintao-Huang
利用8卡4090微调minicpm-v-v2_6-chat-int4模型报CUDA out of memory，int4模型要比正常模型要小很多呀，但是lora的时候显存持续上涨一直到超过24G，这是为什么呢？
命令行：
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 NPROC_PER_NODE=8 swift sft --model_type minicpm-v-v2_6-chat-int4 --model_id_or_path OpenBMB/MiniCPM-V-2_6-int4 --sft_type lora --dataset /home/wzy/disk1/MiniCPM-V/finetune/relation_datasets/standard_train_no_history.jsonl --deepspeed zero2-offload --eval_steps 1000 --lora_dtype fp16

{'loss': 1.6643486, 'acc': 0.54588017, 'grad_norm': 4.12383413, 'learning_rate': 9.978e-05, 'memory(GiB)': 18.58, 'train_speed(iter/s)': 0.49618, 'epoch': 0.08, 'global_step/max_steps': '865/11043', 'percentage': '7.83%', 'elapsed_time': '28m 56s', 'remaining_time': '5h 40m 31s'}
{'loss': 1.60014324, 'acc': 0.59105425, 'grad_norm': 4.04182339, 'learning_rate': 9.977e-05, 'memory(GiB)': 18.58, 'train_speed(iter/s)': 0.49637, 'epoch': 0.08, 'global_step/max_steps': '870/11043', 'percentage': '7.88%', 'elapsed_time': '29m 5s', 'remaining_time': '5h 40m 14s'}
Train: 8%|████████▌ | 874/11043 [29:14<6:30:05, 2.30s/it]Traceback (most recent call last):
File "/mnt/sdb/MiniCPM-V/ms-swift/swift/cli/sft.py", line 5, in
sft_main()
File "/mnt/sdb/MiniCPM-V/ms-swift/swift/utils/run_utils.py", line 32, in x_main
result = llm_x(args, **kwargs)
File "/mnt/sdb/MiniCPM-V/ms-swift/swift/llm/sft.py", line 546, in llm_sft
return trainer_train(args, model, template, train_dataset, val_dataset, callbacks=callbacks, msg=msg)
File "/mnt/sdb/MiniCPM-V/ms-swift/swift/llm/sft.py", line 496, in trainer_train
trainer.train(training_args.resume_from_checkpoint)
File "/mnt/sdb/MiniCPM-V/ms-swift/swift/trainers/mixin.py", line 493, in train
res = super().train(resume_from_checkpoint, *args, **kwargs)
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/transformers/trainer.py", line 3147, in training_step
self.accelerator.backward(loss)
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/accelerate/accelerator.py", line 2117, in backward
self.deepspeed_engine_wrapped.backward(loss, **kwargs)
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 166, in backward
self.engine.backward(loss, **kwargs)
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1976, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2051, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
torch.autograd.backward(
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/torch/autograd/init.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.96 GiB. GPU 0 has a total capacty of 23.65 GiB of which 7.77 GiB is free. Including non-PyTorch memory, this process has 15.87 GiB memory in use. Of the allocated memory 13.98 GiB is allocated by PyTorch, and 1.29 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Jintao-Huang added the good first issue Good for newcomers label Aug 6, 2024

Jintao-Huang changed the title ~~MiniCPM-V 2.6的图片和视频微调~~ MiniCPM-V 2.6的图片和视频微调最佳实践 Aug 6, 2024

Jintao-Huang changed the title ~~MiniCPM-V 2.6的图片和视频微调最佳实践~~ MiniCPM-V 2.6的图片和视频微调的最佳实践 Aug 6, 2024

Jintao-Huang pinned this issue Aug 6, 2024

Jintao-Huang unpinned this issue Aug 6, 2024

Jintao-Huang changed the title ~~MiniCPM-V 2.6的图片和视频微调的最佳实践~~ MiniCPM-V 2.6的推理和微调最佳实践 Aug 6, 2024

tastelikefeet pinned this issue Aug 28, 2024

tastelikefeet unpinned this issue Aug 29, 2024

mycroft1603 mentioned this issue Oct 9, 2024

请问MiniCPM V2_6的视频微调如何设置视频的采样帧数 OpenBMB/MiniCPM-V#604

Open

Best Practices for Inference and Fine-Tuning with MiniCPM-V 2.6 #1613

Best Practices for Inference and Fine-Tuning with MiniCPM-V 2.6 #1613

Comments

Jintao-Huang commented Aug 6, 2024 • edited Loading

模型推理

图片微调

视频微调

demoninpiano commented Aug 7, 2024

Jintao-Huang commented Aug 7, 2024

guihonghao commented Aug 8, 2024

Jintao-Huang commented Aug 8, 2024

compleXuan commented Aug 11, 2024

Jintao-Huang commented Aug 12, 2024

okideal commented Aug 12, 2024

yingdachen commented Aug 12, 2024

Jintao-Huang commented Aug 12, 2024

Jintao-Huang commented Aug 12, 2024

samaritan1998 commented Aug 12, 2024

PancakeAwesome commented Aug 12, 2024

Jintao-Huang commented Aug 12, 2024

Jintao-Huang commented Aug 12, 2024

PancakeAwesome commented Aug 12, 2024 • edited Loading

Jintao-Huang commented Aug 12, 2024

PancakeAwesome commented Aug 12, 2024

Jintao-Huang commented Aug 12, 2024

PancakeAwesome commented Aug 12, 2024

Jintao-Huang commented Aug 12, 2024

PancakeAwesome commented Aug 12, 2024

PancakeAwesome commented Aug 12, 2024

PancakeAwesome commented Aug 12, 2024

Jintao-Huang commented Aug 12, 2024

pramanik2289 commented Aug 26, 2024

Jintao-Huang commented Aug 26, 2024

pramanik2289 commented Aug 26, 2024

zhaoyangwei123 commented Aug 29, 2024

Jintao-Huang commented Aug 29, 2024

zhaoyangwei123 commented Aug 30, 2024 • edited Loading

learn01one commented Sep 3, 2024

ztianlin commented Sep 9, 2024

yingdachen commented Sep 9, 2024

qgl1818 commented Sep 13, 2024

zhaoyangwei123 commented Sep 13, 2024

qgl1818 commented Sep 13, 2024 • edited Loading

zhaoyangwei123 commented Sep 13, 2024

liangyi-qianwan commented Sep 13, 2024

qxzheng commented Sep 15, 2024

pavanvenkatsai commented Sep 28, 2024

Jintao-Huang commented Sep 29, 2024

Jintao-Huang commented Sep 29, 2024

Jintao-Huang commented Sep 29, 2024

vjaideep08 commented Sep 30, 2024

vjaideep08 commented Sep 30, 2024

middleflames commented Sep 30, 2024

NirHeaven commented Oct 22, 2024

2013358072 commented Oct 28, 2024

chen10089 commented Nov 11, 2024

chen10089 commented Nov 11, 2024

zhaoyangwei123 commented Nov 13, 2024 • edited Loading

Jintao-Huang commented Aug 6, 2024 •

edited

Loading

PancakeAwesome commented Aug 12, 2024 •

edited

Loading

zhaoyangwei123 commented Aug 30, 2024 •

edited

Loading

qgl1818 commented Sep 13, 2024 •

edited

Loading

zhaoyangwei123 commented Nov 13, 2024 •

edited

Loading