Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG embedding和rerank模型持续显存占用 #1741

Closed
sunzao413 opened this issue Jun 28, 2024 · 4 comments
Closed

BUG embedding和rerank模型持续显存占用 #1741

sunzao413 opened this issue Jun 28, 2024 · 4 comments
Labels
Milestone

Comments

@sunzao413
Copy link

Docker部署xinference0.12.2版本,embedding和rerank模型持续显存占用

物理机cuda版本:12.1 操作系统:win10 Docker版本:Docker Desktop 4.29.0 (145265) 显卡:RTX3090-24G

由于xinference的0.12.1与0.12.2版本,Docker部署存在启动后自动停止问题,按照前面问题的解决方案,执行了以下安装程序
RUN pip install -U "llama-cpp-python==0.2.77" --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
然后重新生成了新的镜像文件,镜像启动后无错误提示。
加载bge-reranker-v2-m3模型,使用fastgpt+oneapi调用模型
主要问题现象:
1.每一次搜索调用模型后,显存就会累加一次,4,5次24G显存就已占满
2.显存会一直处于被占状态,不会自动释放,需要手动关闭对应模型,如上面的bge-reranker-v2-m3模型,才会立即释放显存
希望的解决方案:
1.按照设置只建立一个副本,每次搜索不会进行显存累加
2.在空闲情况下,自动释放显存,不长期占用

@XprobeBot XprobeBot added the gpu label Jun 28, 2024
@XprobeBot XprobeBot added this to the v0.12.2 milestone Jun 28, 2024
@sunzao413 sunzao413 changed the title BUG BUG embedding和rerank模型持续显存占用 Jun 28, 2024
@XprobeBot XprobeBot modified the milestones: v0.12.2, v0.12.4 Jun 28, 2024
@sunzao413
Copy link
Author

昨天拉取了Docker的V0.12.3版本,可以正常启动不需要在安装llama-cpp-python==0.2.77
但测试rerank模型问题依旧,每一次搜索都会增加大概5G显存占用,很快就占满且不会释放,必须要手动关闭模型才行

@goldeneave
Copy link

解决了吗,我拉起rerank model也会拉升gpu显存占用

@lhs0627
Copy link

lhs0627 commented Aug 8, 2024

你好,请问解决了吗?我用的bge-m3

@lhs0627
Copy link

lhs0627 commented Aug 8, 2024

你好,我是小白,请问你怎么手动关闭模型的呢,运行官方的这个例子,也是运行一次就加好几个g的,
from xinference.client import Client

client = Client("http://localhost:9997")
model_uid = client.launch_model(model_name="bge-m3", model_type="embedding")
model = client.get_model(model_uid)

input_text = "What is the capital of China?"
model.create_embedding(input_text)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants