Running DeepSeek R1 on 8x RTX 6000 PRO

Hi, is this planned to be supported? With a total of 768GB VRAM, this setup should be able to load the original FP8 weights, and with 14 TB/s combined bandwidth it should also be very fast for low batch sizes in tensor parallel mode!

However, trying to load the model with the sample script, the following error is currently returned:
Unsupported SM version for FP8 block scaling GEMM

The FP4 model does load, but it is vastly slower than expected (only 27 t/s for batch 1, I would have expected an order of magnitude faster for 37B active parameters)

Maybe RTX 6000 PRO support is still work in progress in general? I am seeing a bunch of sm120 related commits recently.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running DeepSeek R1 on 8x RTX 6000 PRO #5581

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Running DeepSeek R1 on 8x RTX 6000 PRO #5581

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions