Skip to content

Conversation

@lsy323
Copy link
Collaborator

@lsy323 lsy323 commented Jul 3, 2024

This PR depends on #7605 to land first

With asymmetric quantization, w_dq = w_int * weight_scaler - zero_point.

Thus the matmul becomes
mamtul_out = x @ w_int * weight_scaler - x @ zero_point.unsqueeze(0).broadcast(x.shape[-1])

To compute the item x @ zero_point.unsqueeze(0).broadcast(x.shape[-1]), we use einsum('...c, z', x, zero_point) for per-channel quant, and matmul(x.sum(-1), zero_point) for blockwise quant.

update test update readme fix test add ssymmetric quant op support
@lsy323 lsy323 force-pushed the lsiyuan/asymmetric-quant branch from c5e9081 to 69dc1e8 Compare July 9, 2024 18:20
@lsy323 lsy323 marked this pull request as ready for review July 9, 2024 18:22
@lsy323 lsy323 requested review from JackCaoG and miladm July 9, 2024 18:22
@lsy323 lsy323 merged commit 289471c into master Jul 9, 2024
@lsy323 lsy323 deleted the lsiyuan/asymmetric-quant branch July 9, 2024 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants