[WIP][REP][Serve] Add proposal for API allowing user-defined autoscaling and scheduling algorithms #56

arcyleung · 2024-10-17T13:59:20Z

This REP aims to extend the existing Ray serve.deployment functionality for users to define their own autoscaling and scheduling policy. The existing policies are request-queue-length scaling and power-of-2 scheduling, while the configs themselves can be modified (ie. target_ongoing_requests) the policy's algorithm and monitored heuristics currently cannot be changed by the user. For instance, if the user wants to specify scaling by number of SLA violations detected to proactively scale deployments, they cannot use a request queue length heuristic when each request has variable latency.

arcyleung · 2024-11-22T14:48:58Z

We ran some experiments comparing the behaviour of power of 2 scheduling/ max queue length autoscaling (Ray-like) with this adaptive SLA-aware scheduling/ autoscaling, under a scenario serving 2 LLM applications (wf1,wf2) simultaneously. An adaptive policy for both actions is beneficial for meeting SLAs beyond "lowest-latency" or "highest-throughput" goals, an get a compromise that is just right.

Arthur Leung added 2 commits October 4, 2024 16:34

Add user defined scheduling scaling proposal

e52b8ab

Add proposed ray serve API change

5e88f0f

arcyleung changed the title ~~[WIP] Add proposal for API allowing user-defined autoscaling and scheduling algorithms~~ [WIP][REP][Serve] Add proposal for API allowing user-defined autoscaling and scheduling algorithms Oct 17, 2024

anyscalesam added serve triage labels Oct 17, 2024

arcyleung mentioned this pull request Nov 22, 2024

Implement Adaptive Scaling for Distributed Task Scheduling ray-project/ray#48536

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][REP][Serve] Add proposal for API allowing user-defined autoscaling and scheduling algorithms #56

[WIP][REP][Serve] Add proposal for API allowing user-defined autoscaling and scheduling algorithms #56

arcyleung commented Oct 17, 2024 •

edited

Loading

arcyleung commented Nov 22, 2024 •

edited

Loading

[WIP][REP][Serve] Add proposal for API allowing user-defined autoscaling and scheduling algorithms #56

Are you sure you want to change the base?

[WIP][REP][Serve] Add proposal for API allowing user-defined autoscaling and scheduling algorithms #56

Conversation

arcyleung commented Oct 17, 2024 • edited Loading

arcyleung commented Nov 22, 2024 • edited Loading

arcyleung commented Oct 17, 2024 •

edited

Loading

arcyleung commented Nov 22, 2024 •

edited

Loading