Skip to content
This repository was archived by the owner on Oct 25, 2024. It is now read-only.

Conversation

@luoyu-intel
Copy link
Contributor

@luoyu-intel luoyu-intel commented Aug 23, 2023

Type of Change

Upgrade jblas: a more flexible dynamic-quant interface.

Description

  1. Synchronize jblas code.
  2. Remove the high gcc version requirement.
  3. auto-fusion: depends on weight type and runtime ISA support.

How has this PR been tested?

gptj-6B and llama-7B model.

@luoyu-intel luoyu-intel force-pushed the graph-int8-quantization branch 2 times, most recently from c9146ed to a31b888 Compare August 24, 2023 03:47
Co-Authored-By: Ding, Yi1 <yi1.ding@intel.com> Co-Authored-By: Wang, Zhe1 <zhe1.wang@intel.com> rebase with main add fusion support for llama add fp8 ffn_silu fusion update to jblas 8ad9853 fix hasISA issue fix gcc9 compile fix bug of fp16 weight's quant fix 4bit size add fusion support for gemm_add enable ffn_gelu_add sync jblas, pass compilation fix gcc error fix bug. remove lm_head from non_quant fix mha Signed-off-by: Ding, Yi1 <yi1.ding@intel.com> sync with QBits updates. fix f4 scale
@luoyu-intel luoyu-intel force-pushed the graph-int8-quantization branch from a31b888 to 7c92d1a Compare August 24, 2023 05:13
Copy link
Contributor

@DDEle DDEle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

luoyu-intel and others added 2 commits August 24, 2023 13:57
Remove the high gcc version requirement. auto-fusion: depends on weight type and runtime ISA support. Signed-off-by: luoyu-intel <yu.luo@intel.com> Co-Authored-By: Ding, Yi1 <yi1.ding@intel.com> Co-Authored-By: Wang, Zhe1 <zhe1.wang@intel.com>
Copy link
Contributor

@zhewang1-intc zhewang1-intc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@airMeng airMeng enabled auto-merge (squash) August 24, 2023 08:41
@airMeng airMeng merged commit ff7af86 into main Aug 24, 2023
@airMeng airMeng deleted the graph-int8-quantization branch August 24, 2023 08:41
lvliang-intel pushed a commit that referenced this pull request Aug 24, 2023
add fusion support for llama add fp8 ffn_silu fusion fix hasISA issue fix gcc9 compile fix bug of fp16 weight's quant fix 4bit size add fusion support for gemm_add enable ffn_gelu_add sync jblas, pass compilation fix gcc error fix bug. remove lm_head from non_quant fix mha sync with QBits updates. fix f4 scale Synchronize jblas code. Remove the high gcc version requirement. auto-fusion: depends on weight type and runtime ISA support. --------- Signed-off-by: luoyu-intel <yu.luo@intel.com> Co-authored-by: Ding, Yi1 <yi1.ding@intel.com> Co-authored-by: Wang, Zhe1 <zhe1.wang@intel.com> Signed-off-by: lvliang-intel <liang1.lv@intel.com>
static GemmKernel kernel;
assert(cd->AVX512F());
packedw = kernel.getWeightPtr()->compressWeightTranspose(n, k, f32ptr, k, params.block_size, type);
if (params.scale_dtype == quant_sdtype::fp32) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else gone....

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

5 participants