[llm.serving] Reconfigure router to better perform under high concurrency #50876

GeneDer · 2025-02-25T01:44:00Z

Why are these changes needed?

Accordingly to internal benchmarks the router deployment becomes bottleneck under high concurrency situation. This PR reconfigured the router to use a shorter target ongoing request (16) to better scales the router. And to make router's min_replicas, initial_replicas, and max_replicas to be a factor of N out of all the models. The multiplication factor can also be configured via env var RAYLLM_ROUTER_TO_MODEL_REPLICA_RATIO and is defaulted to 2 (meaning there will be roughly 2 router replicas for every model replica)

Benchmark plot:
Fixed number of router replicas + different amount of model replicas

Variable number of router replicas + model replicas

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…ency Signed-off-by: Gene Su <e870252314@gmail.com>

Signed-off-by: Gene Su <e870252314@gmail.com>

…ency (ray-project#50876) Signed-off-by: Gene Su <e870252314@gmail.com>

…ency (#50876) (#50884) Cherry pick: #50876 Signed-off-by: Gene Su <e870252314@gmail.com>

…ency (ray-project#50876) Signed-off-by: kaihsun <kaihsun@anyscale.com>

…ency (ray-project#50876)

[llm.serving] Reconfigure router to better perform under high concurr…

142b40c

…ency Signed-off-by: Gene Su <e870252314@gmail.com>

GeneDer requested a review from a team as a code owner February 25, 2025 01:44

GeneDer added the go add ONLY when ready to merge, run all tests label Feb 25, 2025

factor in the default when autoscaling is not configured explicitly

32ffbc6

Signed-off-by: Gene Su <e870252314@gmail.com>

kouroshHakha approved these changes Feb 25, 2025

View reviewed changes

kouroshHakha enabled auto-merge (squash) February 25, 2025 02:50

fix possible dict values for autoscaling config

779df69

Signed-off-by: Gene Su <e870252314@gmail.com>

github-actions bot disabled auto-merge February 25, 2025 03:17

fix again

cdd1392

Signed-off-by: Gene Su <e870252314@gmail.com>

kouroshHakha enabled auto-merge (squash) February 25, 2025 04:27

kouroshHakha merged commit fb60d66 into ray-project:master Feb 25, 2025
6 checks passed

GeneDer deleted the reconfig-model-router branch February 25, 2025 05:02

GeneDer mentioned this pull request Feb 25, 2025

[llm.serving] Reconfigure router to better perform under high concurrency (#50876) #50884

Merged

8 tasks

GeneDer added a commit to GeneDer/ray that referenced this pull request Feb 25, 2025

[llm.serving] Reconfigure router to better perform under high concurr…

d201bbb

…ency (ray-project#50876) Signed-off-by: Gene Su <e870252314@gmail.com>

aslonnie pushed a commit that referenced this pull request Feb 25, 2025

[llm.serving] Reconfigure router to better perform under high concurr…

ecdcdc6

…ency (#50876) (#50884) Cherry pick: #50876 Signed-off-by: Gene Su <e870252314@gmail.com>

kevin85421 pushed a commit to kevin85421/ray that referenced this pull request Feb 28, 2025

[llm.serving] Reconfigure router to better perform under high concurr…

0950dec

…ency (ray-project#50876) Signed-off-by: kaihsun <kaihsun@anyscale.com>

xsuler pushed a commit to antgroup/ant-ray that referenced this pull request Mar 4, 2025

[llm.serving] Reconfigure router to better perform under high concurr…

8cd6edf

…ency (ray-project#50876)

park12sj pushed a commit to park12sj/ray that referenced this pull request Mar 18, 2025

[llm.serving] Reconfigure router to better perform under high concurr…

ccdf4d6

…ency (ray-project#50876)

hainesmichaelc added the community-backlog label May 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[llm.serving] Reconfigure router to better perform under high concurrency #50876

[llm.serving] Reconfigure router to better perform under high concurrency #50876

Uh oh!

GeneDer commented Feb 25, 2025 •

edited

Loading

Uh oh!

Labels

3 participants

[llm.serving] Reconfigure router to better perform under high concurrency #50876

[llm.serving] Reconfigure router to better perform under high concurrency #50876

Uh oh!

Conversation

GeneDer commented Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

Labels

3 participants

GeneDer commented Feb 25, 2025 •

edited

Loading