-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model loaded via model repository
api does not appear after querying it with v2/repository/index
endpoint
#7066
Comments
can you share the corresponding triton server log? |
As reference I was able to do the following locally:
Expected Output:
and then from a seperate shell
Expected Output:
|
@nnshah1 Sorry, I was little in a hurry and missed some key details.
docker run -it --rm \
--name triton \
--gpus all --network host \
--shm-size=1g --ulimit memlock=-1 \
nvcr.io/nvidia/tritonserver:24.01-py3 tritonserver --model-control-mode=explicit --model-repository=/home --log-verbose=6 --log-error=1
The idea is that I'm launching |
Can you provide the server logs? I ran the server without loading the model (but still pointing to the example artifacts):
And loaded the example model directly:
And everything worked as expected. Can you check that as a sanity test? My guess is that there is an error either in the pbtxt to json or the way the model bytes are loaded. If you can share the pbtxt to json conversion code you are using could also see if the exact steps reproduce on our end. |
@nnshah1 But I expect that if I uploaded model to the server via API it should show when I query index independently from it existance in folder where To reproduce my case - you need to point to an empty model repository like I suggested:
Since I'm running tritonserver in My use case: my starting triton container on some server with empty model repository and then gradually uploading or unloading models as my needs change. My code to convert pbtxt to json convertion:
|
@ogvalt I understand your use case. Are there any errors on the server side log when loading the model? Can you confirm that loading the example as above (explicitly from a directory via the client) works as well? I'd like to see at which point things diverge from loading the example model directly from disk and when loading by passing the bits in manually. |
@nnshah1
|
quick update - I believe I'm able to reproduce what you are describing - will investigate - |
@nnshah1 |
@nnshah1
Sanity test - checked |
Thanks! Appreciate it. I'm suspecting that since the models get loaded into a temp directory and not /home - there is a difference in how they are listed out in the index. Need to investigate if that is by design or a bug .... |
Looking forward for an answer too. |
@ogvalt I've filed an internal ticket to track - let us know if there timeline / priority for this |
it's not urgent, but I hope it won't take months to see a release with this fix. |
@ogvalt - we're discussing internally and will get back on ETA. |
@ogvalt For a temporary workaround you can find: triton-inference-server/core#340 Need to finalize the change in behavior - but in case you'd like to see it sooner than later. |
@nnshah1 thanks for an update. I was wondering what kind of side effects to expect after dynamically loaded model was unloaded? Like some amount of ram or disk space will be left occupied? or it would be completely deleted? |
It will generally depend on the backend and how it handles things. For the python backend- model instances are in seperate processes so memory would be reclaimed. For In-Process backends like tensorflow and pytorch mileage can very on how quickly and if all memory is reclaimed. For tensorflow specifically we have seen memory being held. |
just checking, how things are going? |
@nnshah1 hey, any updates? |
Description
I've loaded a model via
v2/repository/models/simple/load
endpoint.But when querying
v2/repository/index
endpoint I get a[]
as a responce.Triton Information
What version of Triton are you using?
2.42.0
Are you using the Triton container or did you build it yourself?
Triton container, version
nvcr.io/nvidia/tritonserver:24.01-py3
To Reproduce
tritonclient
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
Model mentioned above
Expected behavior
I expect than this code:
will return responce according to this specification
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/protocol/extension_model_repository.html
The text was updated successfully, but these errors were encountered: