Skip to content
This repository was archived by the owner on Oct 25, 2024. It is now read-only.

Conversation

@changwangss
Copy link
Contributor

@changwangss changwangss commented Feb 7, 2024

Type of Change

I add limitation in the python config.py due to kernel only supports asym in some cases( when enable asym , computer_dtype!=int8,weight-type is int, scale-type is fp32)
update other code to auto get scheme.

Here is the local test results.

rtn facebook/opt-1.3b sym

python run_generation.py --model facebook/opt-1.3b --woq --woq_weight_dtype "int4_clip" --woq_scheme "asym" --benchmark --batch_size 1 ['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun.\n\nOnce upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have'] ['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun.\n\nOnce upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have'] python run_generation.py --model facebook/opt-1.3b --woq --woq_weight_dtype "int4_clip" --woq_scheme "asym" --accuracy --batch_size 56 Running loglikelihood requests 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5151/5151 [09:32<00:00, 9.00it/s] | Task |Version|Metric|Value | |Stderr| |--------------|------:|------|-----:|---|-----:| |lambada_openai| 0|ppl |8.1232|± |0.2400| | | |acc |0.5420|± |0.0069| Accuracy for lambada_openai is: 0.5420143605666602 

gptq facebook/opt-125m sym

python run_generation.py --model facebook/opt-125m --woq --woq_algo "GPTQ" --gptq_pad_max_length 128 --gptq_use_max_length --gptq_block_size 16 --woq_weight_dtype "int4_clip" --output_dir "gptqq" --benchmark --batch_size 1 ['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She loved to travel. She loved to travel. She loved to travel. She loved to travel. She loved to travel. She loved to travel. She loved'] ['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She loved to travel. She loved to travel. She loved to travel. She loved to travel. She loved to travel. She loved to travel. She loved'] python run_generation.py --model facebook/opt-125m --woq --woq_algo "GPTQ" --gptq_pad_max_length 128 --gptq_use_max_length --gptq_block_size 16 --woq_weight_dtype "int4_clip" --output_dir "gptqq" --accuracy --batch_size 56 | Task |Version|Metric| Value | |Stderr| |--------------|------:|------|------:|---|-----:| |lambada_openai| 0|ppl |31.5093|± |1.1838| | | |acc | 0.3588|± |0.0067| Accuracy for lambada_openai is: 0.35882010479332427 

gptq facebook/opt-125m asym

python run_generation.py --model facebook/opt-125m --woq --woq_algo "GPTQ" **--woq_algo "asym"** --gptq_pad_max_length 128 --gptq_use_max_length --gptq_block_size 16 --woq_weight_dtype "int4_clip" --output_dir "gptqq" --benchmark --batch_size 1 ['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She liked to go to the movies, and she liked to have fun. She liked to go to the beach, and she liked to have fun. She liked'] ['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She liked to go to the movies, and she liked to have fun. She liked to go to the beach, and she liked to have fun. She liked'] python run_generation.py --model facebook/opt-125m --woq --woq_algo "GPTQ" **--woq_algo "asym"** --gptq_pad_max_length 128 --gptq_use_max_length --gptq_block_size 16 --woq_weight_dtype "int4_clip" --output_dir "gptqq" --accuracy --batch_size 56 | Task |Version|Metric| Value | |Stderr| |--------------|------:|------|------:|---|-----:| |lambada_openai| 0|ppl |27.0044|± |0.9937| | | |acc | 0.3755|± |0.0067| Accuracy for lambada_openai is: 0.3755094119930138 

Description

detail description
JIRA ticket: xxx

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Signed-off-by: changwangss <chang1.wang@intel.com>
Copy link
Contributor

@zhewang1-intc zhewang1-intc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@changwangss changwangss changed the title [LLM] WOQ support scheme asym [LLM] Support WOQ scheme asym Feb 7, 2024
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: Wang, Chang <chang1.wang@intel.com>
Signed-off-by: Wang, Chang <chang1.wang@intel.com>
@chensuyue chensuyue merged commit c7f0b70 into main Feb 8, 2024
@chensuyue chensuyue deleted the wangchang/asym branch February 8, 2024 06:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

5 participants