Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check the model status before traffic gets redirected to the pods #66

Open
bdattoma opened this issue Aug 22, 2023 · 3 comments
Open

Check the model status before traffic gets redirected to the pods #66

bdattoma opened this issue Aug 22, 2023 · 3 comments
Labels
kind/feature New feature

Comments

@bdattoma
Copy link

/kind feature

Describe the solution you'd like
As of now, while the model is being loaded into memory, user queries may still reach a pod which may not be ready to answer (since the model is not loaded). The larger the model the larger the time to load it the larger the "downtime"

Anything else you would like to add:
This could be useful in different tasks like model/runtime canary rollout, replicas auto-scaling, etc

@openshift-ci openshift-ci bot added the kind/feature New feature label Aug 22, 2023
@bdattoma bdattoma changed the title Check the model status before traffic being redirected to the pod Check the model status before traffic gets redirected to the pods Aug 22, 2023
@danielezonca
Copy link

I think this ticket should be moved to upstream kserve repo ( https://github.com/kserve/kserve/issues ).
We can keep this as "clone" to simplify the tracking but the solution should be implemented upstream

@bdattoma
Copy link
Author

bdattoma commented Sep 7, 2023

upstream issue: kserve#3113

@heyselbi heyselbi moved this from New/Backlog to To-do/Groomed in ODH Model Serving Planning Sep 14, 2023
@bdattoma
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature
Projects
Status: No status
Status: No status
Status: To-do/Groomed
Development

No branches or pull requests

2 participants