Skip to content

Conversation

@zyfncg
Copy link
Contributor

@zyfncg zyfncg commented Dec 8, 2021

PR types

Performance optimization

PR changes

Others

Describe

PTen中使用可变参数模板函数作为注册kernel来代替原先使用KernelContext的kernel注册函数,以消除KernelContext数据结构创建所带来的性能开销。

image
如上图所示:经过test_scale_benchmark性能对比测试,使用了可变参数模板函数的新形式kernel(variadic_args_kernel_fn)调用性能相比原来使用KernelContext的kernel调用性能提升了大约25%。同时可变参数模板函数相比于直接使用switch-case的kernel调用性能相差不大(已接近理论性能最优),但代码更为简洁。

@paddle-bot-old
Copy link

paddle-bot-old bot commented Dec 8, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.


private:
KernelFn fn_{nullptr};
void* variadic_args_kernel_fn_ = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

命名对齐?variadic_args_kernel_fn_ -> variadic_args_fn_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. 修改为variadic_fn_

void operator()(KernelContext* ctx) const { fn_(ctx); }

template <typename Fn>
Fn get_kernel_fn() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

取值方法和成员名一致,get_kernel_fn -> variadic_args_fn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. 由于不完全是取值方法,修改为GetVariadicKernelFn

return func;
}

void set_variadic_args_kernel_fn(void* fn) { variadic_args_kernel_fn_ = fn; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,设值方法为set+成员名

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

将成员变量variadic_fn_改由构造函数进行赋值,set方法删除

struct KernelCallHelper<TypeTag<T>> {
template <int dev_ctx_idx, int in_idx, int attr_idx, int out_idx>
static void Compute(KernelContext* ctx, Args&... args) {
static void Compute(KernelContext* ctx, DevCtx dev_ctx, Args&... args) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里改动的原因是什么

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是为了将DeviceContext参数提取出来给可变参数kernel进行转换

void* variadic_args_kernel_fn) {
KernelName kernel_name(kernel_name_cstr);
KernelKey kernel_key(backend, layout, dtype);
Kernel kernel(kernel_fn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里要不要限制,kernel时必须提供两个输入才能构造,更改构造函数是不是比如后续set更好

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. set方法已删除,构造函数改为两个参数输入

YuanRisheng
YuanRisheng previously approved these changes Dec 10, 2021
@zyfncg zyfncg merged commit b76ef04 into PaddlePaddle:develop Dec 13, 2021
@zyfncg zyfncg deleted the unboxed_kernel branch December 13, 2021 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants