Asymmetric quantized matmul support #7626

lsy323 · 2024-07-03T21:01:33Z

This PR depends on #7605 to land first

With asymmetric quantization, w_dq = w_int * weight_scaler - zero_point.

Thus the matmul becomes
mamtul_out = x @ w_int * weight_scaler - x @ zero_point.unsqueeze(0).broadcast(x.shape[-1])

To compute the item x @ zero_point.unsqueeze(0).broadcast(x.shape[-1]), we use einsum('...c, z', x, zero_point) for per-channel quant, and matmul(x.sum(-1), zero_point) for blockwise quant.

update test update readme fix test add ssymmetric quant op support

docs/quantized_ops.md

lsy323 added the quantization label Jul 8, 2024

add blockwise quant

69dc1e8

update test update readme fix test add ssymmetric quant op support

lsy323 force-pushed the lsiyuan/asymmetric-quant branch from c5e9081 to 69dc1e8 Compare July 9, 2024 18:20

fix docstr

842d927

lsy323 marked this pull request as ready for review July 9, 2024 18:22

lsy323 requested review from JackCaoG and miladm July 9, 2024 18:22

JackCaoG approved these changes Jul 9, 2024

View reviewed changes

docs/quantized_ops.md Show resolved Hide resolved

lsy323 merged commit 289471c into master Jul 9, 2024

lsy323 deleted the lsiyuan/asymmetric-quant branch July 9, 2024 21:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Asymmetric quantized matmul support #7626

Asymmetric quantized matmul support #7626

Uh oh!

lsy323 commented Jul 3, 2024 •

edited

Loading

Uh oh!

Labels

2 participants

Uh oh!

Asymmetric quantized matmul support #7626

Asymmetric quantized matmul support #7626

Uh oh!

Conversation

lsy323 commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Labels

2 participants

lsy323 commented Jul 3, 2024 •

edited

Loading