Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][REP][Serve] Add proposal for API allowing user-defined autoscaling and scheduling algorithms #56

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

arcyleung
Copy link

@arcyleung arcyleung commented Oct 17, 2024

This REP aims to extend the existing Ray serve.deployment functionality for users to define their own autoscaling and scheduling policy. The existing policies are request-queue-length scaling and power-of-2 scheduling, while the configs themselves can be modified (ie. target_ongoing_requests) the policy's algorithm and monitored heuristics currently cannot be changed by the user. For instance, if the user wants to specify scaling by number of SLA violations detected to proactively scale deployments, they cannot use a request queue length heuristic when each request has variable latency.

@arcyleung arcyleung changed the title [WIP] Add proposal for API allowing user-defined autoscaling and scheduling algorithms [WIP][REP][Serve] Add proposal for API allowing user-defined autoscaling and scheduling algorithms Oct 17, 2024
@arcyleung
Copy link
Author

arcyleung commented Nov 22, 2024

We ran some experiments comparing the behaviour of power of 2 scheduling/ max queue length autoscaling (Ray-like) with this adaptive SLA-aware scheduling/ autoscaling, under a scenario serving 2 LLM applications (wf1,wf2) simultaneously. An adaptive policy for both actions is beneficial for meeting SLAs beyond "lowest-latency" or "highest-throughput" goals, an get a compromise that is just right.
{B8B15C9F-1DDE-48DE-A550-8FC0258088B8}
{C3AE6E7C-A616-4C9F-A527-2ABAFA0DC90A}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants