运行嵌入模型报错Remote server 192.0.0.181:44667 closed #2579

minglong-huang · 2024-11-25T05:29:17Z

System Info / 系統信息

Linux
vllm=0.5.2
python=3.10.0
CUDA Version: 12.0

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装

Version info / 版本信息

xinference=0.15.4

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local --host 0.0.0.0 --port 9997

Reproduction / 复现过程

代码如下

from xinference.client import Client
client = Client("http://192.0.0.181:9997")
list_models_run = client.list_models()
model_uid = list_models_run['bge-m3']['id']
embedding_client = client.get_model(model_uid)

text_lsit = 文本块list #每个文本块小于5K字
text_list_len = len(text_list)
step = 100
for index in range(0, text_list_len, step):
    text_embeddings = embedding_client.create_embedding(text_list[index:index + step])

报错如下：

  File "/home/netted/img_process_ml/nlp/net/embed.py", line 34, in text_embed
    text_embeddings = embedding_client.create_embedding(text_list[index:index + step])
  File "/home/netted/anaconda3/envs/nlp/lib/python3.10/site-packages/xinference/client/restful/restful_client.py", line 122, in create_embedding
    raise RuntimeError(
RuntimeError: Failed to create the embeddings, detail: Remote server 192.0.0.181:40919 closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/netted/img_process_ml/nlp/net/embed.py", line 68, in <module>
    text_embed(text_units, embedding_client)
  File "/home/netted/img_process_ml/nlp/net/embed.py", line 38, in text_embed
    text_embeddings = embedding_client.create_embedding(text_list[index:index + step])
  File "/home/netted/anaconda3/envs/nlp/lib/python3.10/site-packages/xinference/client/restful/restful_client.py", line 122, in create_embedding
    raise RuntimeError(
RuntimeError: Failed to create the embeddings, detail: Remote server 192.0.0.181:44667 closed
 74%|███████▍  | 3271500/4425878 [00:12<00:04, 263091.43it/s] 

Process finished with exit code 1

Expected behavior / 期待表现

解决这个问题

The text was updated successfully, but these errors were encountered:

qinxuye · 2024-11-26T03:34:27Z

这个问题一般是 OOM 导致的。

minglong-huang · 2024-11-26T06:19:25Z

这个问题一般是 OOM 导致的。

好奇怪哇它设置10w能跑一段时间 1w也能跑一段时间 1千也能跑几个小时 OOM的话不应该输入进去就报错了嘛

qinxuye · 2024-11-26T07:24:38Z

内部也是分 batch 的，你每次调用的量可以少一点。

minglong-huang · 2024-11-26T08:11:07Z

好咧我试一下参数

XprobeBot added the gpu label Nov 25, 2024

XprobeBot modified the milestones: v0.16, v1.x Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

运行嵌入模型报错Remote server 192.0.0.181:44667 closed #2579

运行嵌入模型报错Remote server 192.0.0.181:44667 closed #2579

minglong-huang commented Nov 25, 2024

qinxuye commented Nov 26, 2024

minglong-huang commented Nov 26, 2024

qinxuye commented Nov 26, 2024

minglong-huang commented Nov 26, 2024

运行嵌入模型报错Remote server 192.0.0.181:44667 closed #2579

运行嵌入模型报错Remote server 192.0.0.181:44667 closed #2579

Comments

minglong-huang commented Nov 25, 2024

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

qinxuye commented Nov 26, 2024

minglong-huang commented Nov 26, 2024

qinxuye commented Nov 26, 2024

minglong-huang commented Nov 26, 2024