Add int8 per channel weight-only quantized matmul #7201

lsy323 · 2024-06-05T20:32:40Z

Add the first xla quantized ops for per-channel weight-only quantized matmul.

The math is out[bf16] = matmul(act[bf16], weight[s8]) * scaler[bf16], the same as what was adopted in the XLA llama quant implementation

User experience:

Call quantized op with already quantized weight in model code
Swap the nn.Linear Module with the added quantized module in model code

More details about user experience can be found in the added README.

Changes:

Added custom torch op and nn.Module for the quantized op
Added user guide

Test:

Test the lowered HLO is doing what we expect
Test it works with Dynamo
Test numerical correctness

int4 and blockwise quant support will be added in following PRs

docs/quantized_ops.md

torch_xla/experimental/xla_quantized_matmul.py

JackCaoG

mostly lgtm, minor nits

Siyuan Liu added 4 commits June 5, 2024 20:31

add quantized layers per channel

f7c200a

enhance tests, clean up

f48c666

add q ops to ci

65f6fca

add README

c042e2f

lsy323 requested review from JackCaoG and miladm June 5, 2024 23:42

lsy323 marked this pull request as ready for review June 5, 2024 23:44

lsy323 changed the title ~~Add int8 per channel quantized matmul~~ Add int8 per channel weight-only quantized matmul Jun 5, 2024

update readme

878d7e7

lsy323 requested a review from qihqi June 6, 2024 00:08

JackCaoG reviewed Jun 6, 2024

View reviewed changes

docs/quantized_ops.md Outdated Show resolved Hide resolved

JackCaoG reviewed Jun 6, 2024

View reviewed changes

docs/quantized_ops.md Show resolved Hide resolved

JackCaoG reviewed Jun 6, 2024

View reviewed changes

torch_xla/experimental/xla_quantized_matmul.py Show resolved Hide resolved

JackCaoG reviewed Jun 6, 2024

View reviewed changes

lsy323 added the quantization label Jun 6, 2024

lsy323 requested a review from JackCaoG June 6, 2024 16:54

JackCaoG approved these changes Jun 6, 2024

View reviewed changes

update readme

b454237

JackCaoG approved these changes Jun 6, 2024

View reviewed changes

lsy323 merged commit 56ddd5d into master Jun 7, 2024

lsy323 deleted the lsiyuan/quant-ops branch June 7, 2024 04:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add int8 per channel weight-only quantized matmul #7201

Add int8 per channel weight-only quantized matmul #7201

Uh oh!

lsy323 commented Jun 5, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

JackCaoG left a comment

Labels

2 participants

Uh oh!

Add int8 per channel weight-only quantized matmul #7201

Add int8 per channel weight-only quantized matmul #7201

Uh oh!

Conversation

lsy323 commented Jun 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JackCaoG left a comment

Choose a reason for hiding this comment

Labels

2 participants

lsy323 commented Jun 5, 2024 •

edited

Loading