Skip to content

Conversation

@GeneDer
Copy link
Member

@GeneDer GeneDer commented Feb 25, 2025

Why are these changes needed?

Accordingly to internal benchmarks the router deployment becomes bottleneck under high concurrency situation. This PR reconfigured the router to use a shorter target ongoing request (16) to better scales the router. And to make router's min_replicas, initial_replicas, and max_replicas to be a factor of N out of all the models. The multiplication factor can also be configured via env var RAYLLM_ROUTER_TO_MODEL_REPLICA_RATIO and is defaulted to 2 (meaning there will be roughly 2 router replicas for every model replica)

Benchmark plot:
Fixed number of router replicas + different amount of model replicas
QPS per replica VS Total Latency

Variable number of router replicas + model replicas
QPS per replica VS Total Latency (1)

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(
…ency Signed-off-by: Gene Su <e870252314@gmail.com>
@GeneDer GeneDer requested a review from a team as a code owner February 25, 2025 01:44
@GeneDer GeneDer added the go add ONLY when ready to merge, run all tests label Feb 25, 2025
Signed-off-by: Gene Su <e870252314@gmail.com>
@kouroshHakha kouroshHakha enabled auto-merge (squash) February 25, 2025 02:50
Signed-off-by: Gene Su <e870252314@gmail.com>
@github-actions github-actions bot disabled auto-merge February 25, 2025 03:17
Signed-off-by: Gene Su <e870252314@gmail.com>
@kouroshHakha kouroshHakha enabled auto-merge (squash) February 25, 2025 04:27
@kouroshHakha kouroshHakha merged commit fb60d66 into ray-project:master Feb 25, 2025
6 checks passed
@GeneDer GeneDer deleted the reconfig-model-router branch February 25, 2025 05:02
GeneDer added a commit to GeneDer/ray that referenced this pull request Feb 25, 2025
aslonnie pushed a commit that referenced this pull request Feb 25, 2025
…ency (#50876) (#50884) Cherry pick: #50876 Signed-off-by: Gene Su <e870252314@gmail.com>
kevin85421 pushed a commit to kevin85421/ray that referenced this pull request Feb 28, 2025
xsuler pushed a commit to antgroup/ant-ray that referenced this pull request Mar 4, 2025
park12sj pushed a commit to park12sj/ray that referenced this pull request Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-backlog go add ONLY when ready to merge, run all tests

3 participants