[compile] Enable sequence parallelism matching w/o custom ops enabled #27126

angelayi · 2025-10-17T21:14:19Z

Purpose

Based on #24604, modified sequence-parallelism pass to do custom op matching w/o needing to enable the custom op

Test Plan

pytest -sv tests/compile/test_sequence_parallelism.py

Performance numbers

I did some benchmarking with the command on H100 w/o flashinfer

VLLM_DISABLE_COMPILE_CACHE=1 VLLM_USE_STANDALONE_COMPILE=1 VLLM_LOGGING_LEVEL=DEBUG vllm bench latency --model=nvidia/Llama-3.3-70B-Instruct-FP8 --output-len 1 --input-len 8192 --batch-size 1 --tensor-parallel-size 8 --load-format dummy --num_iters_warmup 5 --num_iters 15 -O '{"level": 3, "use_inductor_graph_partition": false, "splitting_ops":[], "cudagraph_mode": "FULL", }' --no-enable-prefix-caching

while varying

"pass_config": {"enable_async_tp": true, "enable_sequence_parallelism": true} vs. "pass_config": {"enable_async_tp": false, "enable_sequence_parallelism": false}
"custom_ops":["+quant_fp8", "+rms_norm"] vs. "custom_ops":[]

ProExpertProg

Thanks for taking this on! Could you just add me as a co-author on one of the commits?

tests/compile/test_sequence_parallelism.py

ProExpertProg · 2025-10-18T23:06:57Z

vllm/compilation/sequence_parallelism.py

- """Base helper for RMSNorm and RMSNorm + Quantization functionalization."""
+def get_first_out_wrapper(fn):
+ @functools.wraps(fn)
+ def wrapper(*args):


Does this work? I thought that during tracing the pattern matching tracer will think that args is a single parameter

yes! updated the test to assert the number of all_reduce/all_gather ops in the graph!

ProExpertProg

@cascade812 could you take a look at this please?

ProExpertProg · 2025-10-20T22:18:32Z

Also @angelayi just noticed there's no e2e tests - could you make the existing E2E tests use no custom ops by default (tests/distributed/test_sequence_parallelism.py or something like that) as well as add tests to test_fusions_e2e.py (feel free to grab from #27062)

cascade812 · 2025-10-20T22:42:55Z

@cascade812 could you take a look at this please?

Sure!

cascade812 · 2025-10-26T21:38:20Z

@angelayi I have below error if not specify custom_ops=["+rms_norm"]

torch._inductor.exc.InductorError: RuntimeError: The size of tensor a (s72) must match the size of tensor b ((s72//2)) at non-singleton dimension 0)

cascade812 · 2025-10-26T21:49:07Z

@angelayi It seems odd to me that enabling AsyncTP results in higher latency for Llama-70B. From our earlier benchmark, we observed about a 10% reduce in average latency for prefill stage with AsyncTP enabled for the same model on 4XH200.

ProExpertProg

We no longer have to skip the FP4 tests!

tests/compile/test_fusions_e2e.py

Signed-off-by: angelayi <yiangela7@gmail.com>

tests/compile/test_fusions_e2e.py

Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>

tests/compile/test_fusions_e2e.py

Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>

…vllm-project#27126) Signed-off-by: angelayi <yiangela7@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com> Signed-off-by: George D. Torres <gdavtor@gmail.com>

…#27126) Signed-off-by: angelayi <yiangela7@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com> (cherry picked from commit f36292d)

…vllm-project#27126) Signed-off-by: angelayi <yiangela7@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com> Signed-off-by: Bram Wasti <bwasti@meta.com>

Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: 22dimensions <waitingwind@foxmail.com> Co-authored-by: shen-shanshan <467638484@qq.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com>

Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: 22dimensions <waitingwind@foxmail.com> Co-authored-by: shen-shanshan <467638484@qq.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Signed-off-by: Kurumi5210 <Jaychou1620@Gmail.com>

…vllm-project#27126) Signed-off-by: angelayi <yiangela7@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com>

Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: 22dimensions <waitingwind@foxmail.com> Co-authored-by: shen-shanshan <467638484@qq.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com>

…vllm-project#27126) Signed-off-by: angelayi <yiangela7@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com>

…vllm-project#27126) Signed-off-by: angelayi <yiangela7@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com> Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: 22dimensions <waitingwind@foxmail.com> Co-authored-by: shen-shanshan <467638484@qq.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com>

…vllm-project#27126) Signed-off-by: angelayi <yiangela7@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com>

Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: 22dimensions <waitingwind@foxmail.com> Co-authored-by: shen-shanshan <467638484@qq.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>

Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: 22dimensions <waitingwind@foxmail.com> Co-authored-by: shen-shanshan <467638484@qq.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com>

angelayi force-pushed the sp_custom_op branch from c1efc65 to ed10d76 Compare October 17, 2025 21:15

angelayi marked this pull request as ready for review October 18, 2025 00:33

angelayi requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256, youkaichao and zou3519 as code owners October 18, 2025 00:33

ProExpertProg reviewed Oct 18, 2025

View reviewed changes

ProExpertProg added this to the vllm==v0.12.0/torch==2.9.0 compilation improvements milestone Oct 19, 2025

ProExpertProg added the torch.compile label Oct 19, 2025

github-project-automation bot added this to torch.compile integration Oct 19, 2025

github-project-automation bot moved this to To triage in torch.compile integration Oct 19, 2025

angelayi force-pushed the sp_custom_op branch from ed10d76 to 5d66118 Compare October 19, 2025 17:58

angelayi requested a review from ProExpertProg October 20, 2025 17:15

ProExpertProg approved these changes Oct 20, 2025

View reviewed changes

ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 22, 2025

ProExpertProg enabled auto-merge (squash) October 23, 2025 06:26

ProExpertProg approved these changes Oct 28, 2025

View reviewed changes

tests/compile/test_fusions_e2e.py Outdated Show resolved Hide resolved

tests/compile/test_fusions_e2e.py Outdated Show resolved Hide resolved

tests/compile/test_fusions_e2e.py Outdated Show resolved Hide resolved

ProExpertProg disabled auto-merge October 29, 2025 00:05

ProExpertProg mentioned this pull request Oct 29, 2025

[Bug]: Inductor fails to fuse pointwise ops with sequence parallelism + async TP #27699

Open

1 task

Skip llama-4 in distributed tests

6f6d0f5

Signed-off-by: angelayi <yiangela7@gmail.com>

ProExpertProg enabled auto-merge (squash) November 14, 2025 20:49

ProExpertProg approved these changes Nov 15, 2025

View reviewed changes

tests/compile/test_fusions_e2e.py Outdated Show resolved Hide resolved

fix SP count for qwen

9bb0755

Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>

ProExpertProg approved these changes Nov 15, 2025

View reviewed changes

tests/compile/test_fusions_e2e.py Outdated Show resolved Hide resolved

tests/compile/test_fusions_e2e.py Outdated Show resolved Hide resolved

ProExpertProg added 2 commits November 14, 2025 22:39

change comment

c76f6d6

Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>

remove unnecessary todo

b02f8e6

Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>

ProExpertProg merged commit f36292d into vllm-project:main Nov 15, 2025
50 checks passed

github-project-automation bot moved this from In review to Done in torch.compile integration Nov 15, 2025

njhill mentioned this pull request Nov 15, 2025

[CI] Fix broken pipeline #28781

Merged

khluu modified the milestones: vllm==v0.12.0/torch==2.9.0 compilation improvements, v0.11.1 Nov 15, 2025

jikunshang mentioned this pull request Nov 17, 2025

[XPU] work around for sp, avoid custom op import error #28822

Merged

5 tasks

leo-pony mentioned this pull request Nov 17, 2025

[Bug]: AttributeError: '_OpNamespace' '_C' object has no attribute vllm-project/vllm-ascend#4229

Open

leo-pony mentioned this pull request Nov 18, 2025

[Ascend]: Fixed the issue where OOT Platform vllm-ascend could not enable SP in Eager mode #28935

Merged

5 tasks

wangxiyuan mentioned this pull request Nov 25, 2025

upgrade to vllm 0.11.2 vllm-project/vllm-ascend#4400

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[compile] Enable sequence parallelism matching w/o custom ops enabled #27126

[compile] Enable sequence parallelism matching w/o custom ops enabled #27126

Uh oh!

angelayi commented Oct 17, 2025 •

edited by github-actions bot

Loading

ProExpertProg left a comment

Uh oh!

Uh oh!

ProExpertProg Oct 18, 2025

angelayi Oct 19, 2025

ProExpertProg left a comment

ProExpertProg commented Oct 20, 2025

cascade812 commented Oct 20, 2025

cascade812 commented Oct 26, 2025

cascade812 commented Oct 26, 2025

ProExpertProg left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

5 participants

Uh oh!

[compile] Enable sequence parallelism matching w/o custom ops enabled #27126

[compile] Enable sequence parallelism matching w/o custom ops enabled #27126

Uh oh!

Conversation

angelayi commented Oct 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Performance numbers

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ProExpertProg Oct 18, 2025

Choose a reason for hiding this comment

angelayi Oct 19, 2025

Choose a reason for hiding this comment

ProExpertProg left a comment

Choose a reason for hiding this comment

ProExpertProg commented Oct 20, 2025

cascade812 commented Oct 20, 2025

cascade812 commented Oct 26, 2025

cascade812 commented Oct 26, 2025

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

5 participants

angelayi commented Oct 17, 2025 •

edited by github-actions bot

Loading