Skip to content

Conversation

@Wangzheee
Copy link
Contributor

PR types

Others

PR changes

Others

Describe

增加matmul int8 量化的推理 op_convert 和 plugin:通过调用nvidia 显卡的 Tensor Core提高矩阵乘的计算速度,plugin 的实现包括 int8、fp16、fp32;通过将alpha传入plugin内与矩阵乘一起进行计算,实现matmul+scale的融合,加速推理;增加 dynload 动态加载 libcublasLt.so 的实现;增加对应量化的单测

性能测试:A(1, 28, 256, 1024)*B(1, 28, 1024, 256)

kernel(matmul和scale融合)的执行时间:

matmul int8 layer matmul half layer matmul float32 layer
0.027ms 0.123ms 0.751ms

单OP(matmul和scale融合)网络的执行时间:(int8 的matmul 需要对输入数据重新排布来支持 tensor core,反而会增加耗时,只有在矩阵规模十分庞大时,才能体现矩阵计算的加速效果;本op的实现中可根据对tensor的预分析,自动判断选择性能最佳的 int8、fp16、fp32的plugin)

matmul int8 op matmul half op matmul float32 op
37.2ms 35.6ms 57.1ms

kernel的执行时间:

matmul int8 layer + scale layer matmul half layer + scale layer matmul float32 layer + scale layer
0.053ms 0.152ms 0.815ms

单OP网络的执行时间:

matmul int8 op + scale op matmul half op + scale op matmul float32 op + scale op
41.2ms 39.1ms 65.1ms

总结:当矩阵较大时,matmul int8 op的加速性能较为明显;当存在scale的op融合时,加速性能比较明显
另:matmul int8的显存会有约 5% 的略微减小

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Member

@shangzhizhou shangzhizhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@Superjomn Superjomn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for PADDLE_ENFORCE

int32_t pos, nvinfer1::PluginTensorDesc const* inOut, int32_t nbInputs,
int32_t nbOutputs) const TRT_NOEXCEPT {
PADDLE_ENFORCE_EQ(nbInputs, 2,
platform::errors::InvalidArgument("Must have 2 inputs, "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议报错信息带一些环境信息,这样报错,用户可能不知道是什么场景?什么地方?需要2个输入,后续可以再补充一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,下次pr我加一下~ thanks~

@shangzhizhou shangzhizhou merged commit 1659079 into PaddlePaddle:develop Nov 24, 2021
Zjq9409 pushed a commit to Zjq9409/Paddle that referenced this pull request Dec 10, 2021
…7285) * matmul_convert_int8 * matmul_convert_int8 * matmulconvert_int8 * Matmul_int8_convert: tensor*tensor * Matmul_int8_convert: tensor*tensor * Matmul_int8_convert: tensor*tensor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

5 participants