- Notifications
You must be signed in to change notification settings - Fork 2k
Open
Labels
not a bugSome known limitation, but not a bug.Some known limitation, but not a bug.
Description
Hi, is this planned to be supported? With a total of 768GB VRAM, this setup should be able to load the original FP8 weights, and with 14 TB/s combined bandwidth it should also be very fast for low batch sizes in tensor parallel mode!
However, trying to load the model with the sample script, the following error is currently returned:
Unsupported SM version for FP8 block scaling GEMM
The FP4 model does load, but it is vastly slower than expected (only 27 t/s for batch 1, I would have expected an order of magnitude faster for 37B active parameters)
Maybe RTX 6000 PRO support is still work in progress in general? I am seeing a bunch of sm120 related commits recently.
Metadata
Metadata
Assignees
Labels
not a bugSome known limitation, but not a bug.Some known limitation, but not a bug.