[Serve][LLM] Add Pause/Resume Control Plane API for Ray Serve LLM #59523

kouroshHakha · 2025-12-17T23:43:32Z

Summary

This PR adds pause/resume control plane endpoints to DevIngress, enabling RL training workflows to temporarily halt generation without offloading model weights from GPU memory.

Must merge before this PR:

[Serve][LLM] Refactor Control Plane Endpoints into Mixins #59502

Motivation

During RL training, there's often a need to pause inference for weight updates. While sleep/wakeup exists for memory management (offloads weights to CPU), pause/resume provides a lightweight alternative that:

Halts generation immediately
Keeps model weights in GPU memory
Enables faster resume since no weight reloading is needed

Changes

1. New Control Plane Endpoints

Added three new HTTP endpoints via PausableIngressMixin:

POST /pause - Pause generation on all replicas
POST /resume - Resume generation after pause
GET /is_paused - Check pause status

2. vLLM Integration

Integrated with vLLM's pause_generation() / resume_generation() / is_paused() API with configurable options:

wait_for_inflight_requests: Wait for in-flight requests to complete vs abort immediately
clear_cache: Whether to clear KV cache on pause

Testing

Unit tests for pause/resume cycle and model isolation
End-to-end release test (test_llm_serve_pause_resume.py)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

…trl-plane-mixins-v2 Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

gemini-code-assist

Code Review

This pull request introduces a pause/resume control plane API, which is a valuable addition for RL training workflows. The refactoring of existing control plane endpoints into modular mixins (SleepableIngressMixin, PausableIngressMixin, CacheManagerIngressMixin) is a significant improvement, making the code cleaner and more extensible. The test coverage, including unit, isolation, and a new end-to-end release test, is thorough and well-executed. I've found one high-severity issue related to a blocking call in an async context that could impact server performance, which I've detailed in a specific comment. Overall, this is a high-quality contribution.

python/ray/llm/_internal/serve/core/ingress/mixins/broadcastable.py

…ause-resume-2 Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha · 2025-12-18T01:23:54Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a pause/resume control plane API for Ray Serve LLM, which is a valuable addition for workflows like reinforcement learning training. The implementation is well-structured, following existing patterns for control plane endpoints. The changes are spread across the ingress, server, and engine layers, and are accompanied by both unit tests and a new end-to-end release test.

My review focuses on improving the robustness of the new release test by replacing fixed-time sleeps with polling, which will help prevent flakiness and unnecessary delays. The core logic of the feature itself appears solid.

release/llm_tests/serve/test_llm_serve_pause_resume.py

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

python/ray/llm/_internal/serve/core/server/llm_server.py

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

python/ray/llm/_internal/serve/core/engine/protocol.py

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

eicherseiji

LGTM

eicherseiji · 2025-12-18T18:33:24Z

python/ray/llm/_internal/serve/core/ingress/mixins/pausable.py

+ IsPausedResponse with is_paused boolean.
+ """
+ results = await self._broadcast_to_replicas(model, "is_paused")
+ is_paused_result = any(results) if results else False


Some debug logs about the results could be helpful?

discussed offline. I think we can add it if we need to debug. I am not sure how to tune the verbosity in a nice way.

…y-project#59523) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha added 8 commits December 17, 2025 00:49

wip

eec29b3

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Merge branch 'master' of https://github.com/ray-project/ray into kh/c…

83bb6cf

…trl-plane-mixins-v2 Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

833ff2c

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

688f295

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

8ab4ddf

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

1ae8ba7

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

5939e78

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

6308505

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha added the go add ONLY when ready to merge, run all tests label Dec 17, 2025

gemini-code-assist bot reviewed Dec 17, 2025

View reviewed changes

python/ray/llm/_internal/serve/core/ingress/mixins/broadcastable.py Outdated Show resolved Hide resolved

Merge branch 'master' of https://github.com/ray-project/ray into kh/p…

b992190

…ause-resume-2 Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha marked this pull request as ready for review December 18, 2025 01:24

kouroshHakha requested a review from a team as a code owner December 18, 2025 01:24

gemini-code-assist bot reviewed Dec 18, 2025

View reviewed changes

release/llm_tests/serve/test_llm_serve_pause_resume.py Outdated Show resolved Hide resolved

release/llm_tests/serve/test_llm_serve_pause_resume.py Outdated Show resolved Hide resolved

release/llm_tests/serve/test_llm_serve_pause_resume.py Outdated Show resolved Hide resolved

ray-gardener bot added serve Ray Serve Related Issue llm labels Dec 18, 2025

wip

cb103f8

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

cursor bot reviewed Dec 18, 2025

View reviewed changes

python/ray/llm/_internal/serve/core/server/llm_server.py Show resolved Hide resolved

wip

436c77d

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

cursor bot reviewed Dec 18, 2025

View reviewed changes

python/ray/llm/_internal/serve/core/engine/protocol.py Show resolved Hide resolved

wip

76f0ce0

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

eicherseiji approved these changes Dec 18, 2025

View reviewed changes

kouroshHakha merged commit 311eae2 into ray-project:master Dec 18, 2025
6 checks passed

Yicheng-Lu-llll pushed a commit to Yicheng-Lu-llll/ray that referenced this pull request Dec 22, 2025

[Serve][LLM] Add Pause/Resume Control Plane API for Ray Serve LLM (ra…

e38a6dc

…y-project#59523) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Serve][LLM] Add Pause/Resume Control Plane API for Ray Serve LLM #59523

[Serve][LLM] Add Pause/Resume Control Plane API for Ray Serve LLM #59523

Uh oh!

kouroshHakha commented Dec 17, 2025 •

edited

Loading

gemini-code-assist bot left a comment

Uh oh!

kouroshHakha commented Dec 18, 2025

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eicherseiji left a comment

eicherseiji Dec 18, 2025 •

edited

Loading

kouroshHakha Dec 18, 2025

Uh oh!

Labels

2 participants

[Serve][LLM] Add Pause/Resume Control Plane API for Ray Serve LLM #59523

[Serve][LLM] Add Pause/Resume Control Plane API for Ray Serve LLM #59523

Uh oh!

Conversation

kouroshHakha commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

1. New Control Plane Endpoints

2. vLLM Integration

Testing

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

kouroshHakha commented Dec 18, 2025

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eicherseiji left a comment

Choose a reason for hiding this comment

eicherseiji Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

kouroshHakha Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Labels

2 participants

kouroshHakha commented Dec 17, 2025 •

edited

Loading

eicherseiji Dec 18, 2025 •

edited

Loading