Skip to content

Conversation

@CUHKSZzxy
Copy link
Collaborator

@CUHKSZzxy CUHKSZzxy commented Jul 4, 2025

Refer to the dlBLAS. Not compatible with CUDA graph, used with --eager-mode

LMDEPLOY_DUMP_EXPERT_DISTRIBUTION=1 \ LMDEPLOY_EXPERT_DUMP_DIR="your_expert_distribution_dir" \ LMDEPLOY_DP_MASTER_ADDR=0.0.0.0 \ LMDEPLOY_DP_MASTER_PORT=29555 \ lmdeploy serve api_server \ Qwen/Qwen3-235B-A22B-FP8 \ --backend pytorch \ --tp 1 \ --dp 4 \ --ep 4 \ --proxy-url http://0.0.0.0:8001 \ --nnodes 1 \ --node-rank 0 \ --eager-mode \ --log-level INFO 
@CUHKSZzxy CUHKSZzxy marked this pull request as ready for review July 4, 2025 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant