Skip to content

Commit a65a934

Browse files
authored
[CI/Build] Temporary fix to LM Eval Small Models (vllm-project#28324)
Signed-off-by: zhewenli <zhewenli@meta.com>
1 parent 4a8d6bd commit a65a934

File tree

3 files changed

+8
-3
lines changed

3 files changed

+8
-3
lines changed

.buildkite/test-pipeline.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1253,7 +1253,7 @@ steps:
12531253
- pytest -v -s tests/compile/test_fusions_e2e.py::test_tp2_attn_quant_allreduce_rmsnorm
12541254
- pytest -v -s tests/distributed/test_context_parallel.py
12551255
- CUDA_VISIBLE_DEVICES=1,2 VLLM_ALL2ALL_BACKEND=deepep_high_throughput VLLM_USE_DEEP_GEMM=1 VLLM_LOGGING_LEVEL=DEBUG python3 examples/offline_inference/data_parallel.py --model Qwen/Qwen1.5-MoE-A2.7B --tp-size=1 --dp-size=2 --max-model-len 2048
1256-
- pytest -v -s tests/v1/distributed/test_dbo.py
1256+
- pytest -v -s tests/v1/distributed/test_dbo.py
12571257

12581258
##### B200 test #####
12591259
- label: Distributed Tests (B200) # optional

tests/evals/gsm8k/configs/Qwen1.5-MoE-W4A16-CT.yaml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,7 @@ model_name: "nm-testing/Qwen1.5-MoE-A2.7B-Chat-quantized.w4a16"
22
accuracy_threshold: 0.45
33
num_questions: 1319
44
num_fewshot: 5
5-
max_model_len: 4096
5+
max_model_len: 4096
6+
# Duo stream incompatabilbe with this model: https://github.com/vllm-project/vllm/issues/28220
7+
env:
8+
VLLM_DISABLE_SHARED_EXPERTS_STREAM: "1"

tests/evals/gsm8k/test_gsm8k_correctness.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,9 +62,11 @@ def test_gsm8k_correctness_param(config_filename, tp_size):
6262
str(tp_size),
6363
]
6464

65+
env_dict = eval_config.get("env", None)
66+
6567
# Launch server and run evaluation
6668
with RemoteOpenAIServer(
67-
eval_config["model_name"], server_args, max_wait_seconds=480
69+
eval_config["model_name"], server_args, env_dict=env_dict, max_wait_seconds=480
6870
) as remote_server:
6971
server_url = remote_server.url_for("v1")
7072

0 commit comments

Comments
 (0)