BUG embedding和rerank模型持续显存占用 #1741

sunzao413 · 2024-06-28T09:12:14Z

Docker部署xinference0.12.2版本，embedding和rerank模型持续显存占用

物理机cuda版本：12.1 操作系统：win10 Docker版本：Docker Desktop 4.29.0 (145265) 显卡：RTX3090-24G

由于xinference的0.12.1与0.12.2版本，Docker部署存在启动后自动停止问题，按照前面问题的解决方案，执行了以下安装程序
RUN pip install -U "llama-cpp-python==0.2.77" --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
然后重新生成了新的镜像文件，镜像启动后无错误提示。
加载bge-reranker-v2-m3模型，使用fastgpt+oneapi调用模型
主要问题现象：
1.每一次搜索调用模型后，显存就会累加一次，4，5次24G显存就已占满
2.显存会一直处于被占状态，不会自动释放，需要手动关闭对应模型，如上面的bge-reranker-v2-m3模型，才会立即释放显存
希望的解决方案：
1.按照设置只建立一个副本，每次搜索不会进行显存累加
2.在空闲情况下，自动释放显存，不长期占用

sunzao413 · 2024-06-29T01:35:26Z

昨天拉取了Docker的V0.12.3版本，可以正常启动不需要在安装llama-cpp-python==0.2.77
但测试rerank模型问题依旧，每一次搜索都会增加大概5G显存占用，很快就占满且不会释放，必须要手动关闭模型才行

goldeneave · 2024-07-04T02:24:12Z

解决了吗，我拉起rerank model也会拉升gpu显存占用

lhs0627 · 2024-08-08T10:28:23Z

你好，请问解决了吗？我用的bge-m3

lhs0627 · 2024-08-08T10:44:21Z

你好，我是小白，请问你怎么手动关闭模型的呢，运行官方的这个例子，也是运行一次就加好几个g的，
from xinference.client import Client

client = Client("http://localhost:9997")
model_uid = client.launch_model(model_name="bge-m3", model_type="embedding")
model = client.get_model(model_uid)

input_text = "What is the capital of China?"
model.create_embedding(input_text)

XprobeBot added the gpu label Jun 28, 2024

XprobeBot added this to the v0.12.2 milestone Jun 28, 2024

sunzao413 changed the title ~~BUG~~ BUG embedding和rerank模型持续显存占用 Jun 28, 2024

XprobeBot modified the milestones: v0.12.2, v0.12.4 Jun 28, 2024

sunzao413 closed this as completed Jun 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG embedding和rerank模型持续显存占用 #1741

BUG embedding和rerank模型持续显存占用 #1741

sunzao413 commented Jun 28, 2024

sunzao413 commented Jun 29, 2024

goldeneave commented Jul 4, 2024

lhs0627 commented Aug 8, 2024

lhs0627 commented Aug 8, 2024 •

edited

Loading

BUG embedding和rerank模型持续显存占用 #1741

BUG embedding和rerank模型持续显存占用 #1741

Comments

sunzao413 commented Jun 28, 2024

sunzao413 commented Jun 29, 2024

goldeneave commented Jul 4, 2024

lhs0627 commented Aug 8, 2024

lhs0627 commented Aug 8, 2024 • edited Loading

lhs0627 commented Aug 8, 2024 •

edited

Loading