-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(llm): Determine the best LLM deployment config automatically #2396
Comments
I guess it may belong to the scope of KServe, since Katib focuses on the hyperparameters tuning of models :) |
It's more like a tuning job. You can consider tuning the deployment configs. (e.g. distributed strategy) |
Thank you for creating this @gaocegege! Yes, I think optimization of LLM Deployment makes sense since Katib is able to perform any optimization task (not even ML) and orchestrate any resources as Trials. It would be nice to get someone from the Kubeflow community who can explore the Vidur aspects and see how Katib can be useful. /help |
@andreyvelich: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@andreyvelich: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@gaocegege @andreyvelich I would love to look into this,can I work on this ? /assign |
Yes, that would be amazing @gjyotin305! /assign @gjyotin305 |
Sure |
What you would like to be added?
Inspired by this research paper Vidur: A Large-Scale Simulation Framework For LLM Inference
Optimizing the deployment of Large language models (LLMs) is expensive today since it requires experimentally running an application workload against an LLM implementation while exploring large configuration space formed by system knobs such as parallelization strategies, batching techniques, and scheduling policies.
Why is this needed?
Not sure if it is in the scope of katib, but glad to raise an issue here.
Love this feature?
Give it a 👍 We prioritize the features with most 👍
The text was updated successfully, but these errors were encountered: