[LLM] Support WOQ scheme asym #1266

changwangss · 2024-02-07T08:30:23Z

Type of Change

I add limitation in the python config.py due to kernel only supports asym in some cases( when enable asym , computer_dtype!=int8,weight-type is int, scale-type is fp32)
update other code to auto get scheme.

Here is the local test results.

rtn facebook/opt-1.3b sym

python run_generation.py --model facebook/opt-1.3b --woq --woq_weight_dtype "int4_clip" --woq_scheme "asym" --benchmark --batch_size 1 ['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun.\n\nOnce upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have'] ['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun.\n\nOnce upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have'] python run_generation.py --model facebook/opt-1.3b --woq --woq_weight_dtype "int4_clip" --woq_scheme "asym" --accuracy --batch_size 56 Running loglikelihood requests 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5151/5151 [09:32<00:00, 9.00it/s] | Task |Version|Metric|Value | |Stderr| |--------------|------:|------|-----:|---|-----:| |lambada_openai| 0|ppl |8.1232|± |0.2400| | | |acc |0.5420|± |0.0069| Accuracy for lambada_openai is: 0.5420143605666602

gptq facebook/opt-125m sym

python run_generation.py --model facebook/opt-125m --woq --woq_algo "GPTQ" --gptq_pad_max_length 128 --gptq_use_max_length --gptq_block_size 16 --woq_weight_dtype "int4_clip" --output_dir "gptqq" --benchmark --batch_size 1 ['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She loved to travel. She loved to travel. She loved to travel. She loved to travel. She loved to travel. She loved to travel. She loved'] ['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She loved to travel. She loved to travel. She loved to travel. She loved to travel. She loved to travel. She loved to travel. She loved'] python run_generation.py --model facebook/opt-125m --woq --woq_algo "GPTQ" --gptq_pad_max_length 128 --gptq_use_max_length --gptq_block_size 16 --woq_weight_dtype "int4_clip" --output_dir "gptqq" --accuracy --batch_size 56 | Task |Version|Metric| Value | |Stderr| |--------------|------:|------|------:|---|-----:| |lambada_openai| 0|ppl |31.5093|± |1.1838| | | |acc | 0.3588|± |0.0067| Accuracy for lambada_openai is: 0.35882010479332427

gptq facebook/opt-125m asym

python run_generation.py --model facebook/opt-125m --woq --woq_algo "GPTQ" **--woq_algo "asym"** --gptq_pad_max_length 128 --gptq_use_max_length --gptq_block_size 16 --woq_weight_dtype "int4_clip" --output_dir "gptqq" --benchmark --batch_size 1 ['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She liked to go to the movies, and she liked to have fun. She liked to go to the beach, and she liked to have fun. She liked'] ['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She liked to go to the movies, and she liked to have fun. She liked to go to the beach, and she liked to have fun. She liked'] python run_generation.py --model facebook/opt-125m --woq --woq_algo "GPTQ" **--woq_algo "asym"** --gptq_pad_max_length 128 --gptq_use_max_length --gptq_block_size 16 --woq_weight_dtype "int4_clip" --output_dir "gptqq" --accuracy --batch_size 56 | Task |Version|Metric| Value | |Stderr| |--------------|------:|------|------:|---|-----:| |lambada_openai| 0|ppl |27.0044|± |0.9937| | | |acc | 0.3755|± |0.0067| Accuracy for lambada_openai is: 0.3755094119930138

Description

detail description
JIRA ticket: xxx

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Signed-off-by: changwangss <chang1.wang@intel.com>

zhewang1-intc

LGTM

intel_extension_for_transformers/transformers/utils/config.py

Signed-off-by: changwangss <chang1.wang@intel.com>

Signed-off-by: Wang, Chang <chang1.wang@intel.com>

support woq asym

374b372

Signed-off-by: changwangss <chang1.wang@intel.com>

changwangss requested a review from PenghuiCheng as a code owner February 7, 2024 08:30

changwangss requested a review from zhewang1-intc February 7, 2024 08:34

zhewang1-intc approved these changes Feb 7, 2024

View reviewed changes

PenghuiCheng approved these changes Feb 7, 2024

View reviewed changes

changwangss changed the title ~~[LLM] WOQ support scheme asym~~ [LLM] Support WOQ scheme asym Feb 7, 2024

zhewang1-intc reviewed Feb 7, 2024

View reviewed changes

intel_extension_for_transformers/transformers/utils/config.py Outdated Show resolved Hide resolved

changwangss added 2 commits February 7, 2024 01:50

fix ut

12c03f1

Signed-off-by: changwangss <chang1.wang@intel.com>

add model.eval for neuralchat

99a1734

Signed-off-by: changwangss <chang1.wang@intel.com>

changwangss requested a review from lvliang-intel as a code owner February 8, 2024 02:06

changwangss added 4 commits February 7, 2024 18:23

fix training scheme

8222d0d

Signed-off-by: changwangss <chang1.wang@intel.com>

Update model_utils.py

291b86d

Signed-off-by: Wang, Chang <chang1.wang@intel.com>

Update model_utils.py

e378d51

Signed-off-by: Wang, Chang <chang1.wang@intel.com>

Merge branch 'main' into wangchang/asym

3717d78

chensuyue merged commit c7f0b70 into main Feb 8, 2024

chensuyue deleted the wangchang/asym branch February 8, 2024 06:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LLM] Support WOQ scheme asym #1266

[LLM] Support WOQ scheme asym #1266

Uh oh!

changwangss commented Feb 7, 2024 •

edited

Loading

zhewang1-intc left a comment

Uh oh!

Labels

5 participants

[LLM] Support WOQ scheme asym #1266

[LLM] Support WOQ scheme asym #1266

Uh oh!

Conversation

changwangss commented Feb 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

zhewang1-intc left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

5 participants

changwangss commented Feb 7, 2024 •

edited

Loading