Upgrade jblas #165

luoyu-intel · 2023-08-23T07:04:21Z

Type of Change

Upgrade jblas: a more flexible dynamic-quant interface.

Description

Synchronize jblas code.
Remove the high gcc version requirement.
auto-fusion: depends on weight type and runtime ISA support.

How has this PR been tested?

gptj-6B and llama-7B model.

...sion_for_transformers/backends/neural_engine/graph/jblas/jblas/jit_blas_weight_compression.h

Co-Authored-By: Ding, Yi1 <yi1.ding@intel.com> Co-Authored-By: Wang, Zhe1 <zhe1.wang@intel.com> rebase with main add fusion support for llama add fp8 ffn_silu fusion update to jblas 8ad9853 fix hasISA issue fix gcc9 compile fix bug of fp16 weight's quant fix 4bit size add fusion support for gemm_add enable ffn_gelu_add sync jblas, pass compilation fix gcc error fix bug. remove lm_head from non_quant fix mha Signed-off-by: Ding, Yi1 <yi1.ding@intel.com> sync with QBits updates. fix f4 scale

DDEle

LGTM

intel_extension_for_transformers/backends/neural_engine/graph/jblas/CMakeLists.txt

Remove the high gcc version requirement. auto-fusion: depends on weight type and runtime ISA support. Signed-off-by: luoyu-intel <yu.luo@intel.com> Co-Authored-By: Ding, Yi1 <yi1.ding@intel.com> Co-Authored-By: Wang, Zhe1 <zhe1.wang@intel.com>

…tel-extension-for-transformers into graph-int8-quantization

zhewang1-intc

LGTM

add fusion support for llama add fp8 ffn_silu fusion fix hasISA issue fix gcc9 compile fix bug of fp16 weight's quant fix 4bit size add fusion support for gemm_add enable ffn_gelu_add sync jblas, pass compilation fix gcc error fix bug. remove lm_head from non_quant fix mha sync with QBits updates. fix f4 scale Synchronize jblas code. Remove the high gcc version requirement. auto-fusion: depends on weight type and runtime ISA support. --------- Signed-off-by: luoyu-intel <yu.luo@intel.com> Co-authored-by: Ding, Yi1 <yi1.ding@intel.com> Co-authored-by: Wang, Zhe1 <zhe1.wang@intel.com> Signed-off-by: lvliang-intel <liang1.lv@intel.com>

airMeng · 2023-11-10T06:12:38Z

...l_extension_for_transformers/backends/neural_engine/graph/models/model_utils/model_utils.cpp

- static GemmKernel kernel;
- assert(cd->AVX512F());
- packedw = kernel.getWeightPtr()->compressWeightTranspose(n, k, f32ptr, k, params.block_size, type);
+ if (params.scale_dtype == quant_sdtype::fp32) {


else gone....

luoyu-intel requested review from DDEle and zhewang1-intc August 23, 2023 07:24

zhewang1-intc reviewed Aug 23, 2023

View reviewed changes

...sion_for_transformers/backends/neural_engine/graph/jblas/jblas/jit_blas_weight_compression.h Show resolved Hide resolved

luoyu-intel force-pushed the graph-int8-quantization branch 2 times, most recently from c9146ed to a31b888 Compare August 24, 2023 03:47

luoyu-intel force-pushed the graph-int8-quantization branch from a31b888 to 7c92d1a Compare August 24, 2023 05:13

DDEle approved these changes Aug 24, 2023

View reviewed changes

DDEle reviewed Aug 24, 2023

View reviewed changes

intel_extension_for_transformers/backends/neural_engine/graph/jblas/CMakeLists.txt Show resolved Hide resolved

luoyu-intel and others added 2 commits August 24, 2023 13:57

Synchronize jblas code.

bf1a436

Remove the high gcc version requirement. auto-fusion: depends on weight type and runtime ISA support. Signed-off-by: luoyu-intel <yu.luo@intel.com> Co-Authored-By: Ding, Yi1 <yi1.ding@intel.com> Co-Authored-By: Wang, Zhe1 <zhe1.wang@intel.com>

Merge branch 'graph-int8-quantization' of https://github.com/intel/in…

80db240

…tel-extension-for-transformers into graph-int8-quantization

zhewang1-intc approved these changes Aug 24, 2023

View reviewed changes

airMeng enabled auto-merge (squash) August 24, 2023 08:41

airMeng merged commit ff7af86 into main Aug 24, 2023

airMeng deleted the graph-int8-quantization branch August 24, 2023 08:41

airMeng reviewed Nov 10, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Upgrade jblas #165

Upgrade jblas #165

Uh oh!

luoyu-intel commented Aug 23, 2023 •

edited

Loading

Uh oh!

DDEle left a comment

Uh oh!

zhewang1-intc left a comment

airMeng Nov 10, 2023

Labels

5 participants

Upgrade jblas #165

Upgrade jblas #165

Uh oh!

Conversation

luoyu-intel commented Aug 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of Change

Description

How has this PR been tested?

Uh oh!

DDEle left a comment

Choose a reason for hiding this comment

Uh oh!

zhewang1-intc left a comment

Choose a reason for hiding this comment

airMeng Nov 10, 2023

Choose a reason for hiding this comment

Labels

5 participants

luoyu-intel commented Aug 23, 2023 •

edited

Loading