Skip to content

Conversation

@JamesLim-sy
Copy link
Contributor

@JamesLim-sy JamesLim-sy commented Dec 21, 2021

PR types

New features

PR changes

OPs

Describe

  1. 功能特性 & 注意事项:
  • 支持Elementwise多输出功能,简化以下计算情形中多次LaunchBroadcastElementwiseCudaKernel 所产生的启动overhead开销
    out_1 = x + y * z; out_2 = x / z;
  • 但必须注意 out_1out_2 的维度应相同,且2个functor的axis设置须要相同
  1. 使用方法:
  • 引入paddle::framework::Array 作为输入的数据类型,并设计如下所示的functor
template <typename InT, typename OutT> struct Functor { HOSTDEVICE paddle::framework::Array<OutT, 2> operator()(InT x, InT y, InT z) { paddle::framework::Array<OutT, 2> outs; outs[0] = x + y * z; outs[1] = x / z; return outs; } };
  • 调用 LaunchBroadcastElementwiseCudaKernel时,模板参数由于<InT, OutT, functor> 转变为<InT, OutT, functor, NumOuts>,其中NumOuts用于表达functor的输出元素数量(本例为2),默认值为1故兼容现有的单个functor计算的写法
  1. 性能表现:
  • 支持多输出前后,elementwise计算性能几无影响:

  • Fig. 1 支持多输出功能前
    image

  • Fig. 2 支持多输出功能后
    image

  1. 遗留项目
  • 目前引入了模板参数NumOuts 用于实现多输出情形,但面对多输出情形Function_traits中的ReturnType 模板参数本身也是paddle::framework::Array<OutT, NumOuts> 的类型,一直想从ReturnType中尝试获取NumOuts,暂时未能有效实现
@paddle-bot-old
Copy link

paddle-bot-old bot commented Dec 21, 2021

✅ This PR's description meets the template requirements!
Please wait for other CI results.

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@JamesLim-sy JamesLim-sy changed the title Support Multiple output for elementwise and broadcast ops Support Multiple outputs for elementwise and broadcast ops Dec 21, 2021
@JamesLim-sy JamesLim-sy changed the title Support Multiple outputs for elementwise and broadcast ops Support Multiple outputs feature for elementwise and broadcast ops Dec 21, 2021
@JamesLim-sy JamesLim-sy force-pushed the support_multi-output_for_broadcast branch from 3d77a4c to 474b13c Compare December 22, 2021 13:20
@JamesLim-sy JamesLim-sy force-pushed the support_multi-output_for_broadcast branch from 474b13c to ffccae7 Compare December 22, 2021 18:44
Functor func) {
InT args[Arity][VecSize];
OutT result[VecSize];
ScalarType<OutT> vec_result[NumOuts][VecSize];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个vec_result可以挪到L199 WriteData函数里面定义

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已根据要求修改

@JamesLim-sy JamesLim-sy changed the title Support Multiple outputs feature for elementwise and broadcast ops Support multi-outputs feature for broadcast ops Dec 24, 2021
@JamesLim-sy JamesLim-sy requested a review from Xreki December 24, 2021 05:53
// Vector type
const int kVectorSize = (NX % 4 == 0) ? 4 : (NX % 2 == 0) ? 2 : 1;
const int kVectorsPerThread = NX / kVectorSize;
constexpr int kVectorSize = (NX % 4 == 0) ? 4 : (NX % 2 == 0) ? 2 : 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constexpr 有性能影响吗 (c++14 支持) ?理论上 const 编译器也会在编译时计算

Copy link
Contributor Author

@JamesLim-sy JamesLim-sy Dec 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个修改是从读代码的角度出发的,切换成constexpr的话,可以显式认定是编译期计算

};

template <typename T, int VecSize, int Rank, bool IsBoundary = false>
template <typename T, int VecSize, int Rank, bool IsBoundary>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么要去掉默认设置呢 ?

Copy link
Contributor Author

@JamesLim-sy JamesLim-sy Dec 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改的时候看到LoadData只接受了ElementwiseBroadcastKernelImpl 函数的调用,而ElementwiseBroadcastKernelImpl 已经设定了bool IsBoundary = false,所以取消了这里的默认模板参数设置;

修改回原始状态

T *dst,
const T *__restrict__ src,
uint32_t block_offset,
int block_offset,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么要从 uint 改成 int ?

Copy link
Contributor Author

@JamesLim-sy JamesLim-sy Dec 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

和上一条相同,修改的时候看到LoadData只接受了ElementwiseBroadcastKernelImpl函数的调用,而ElementwiseBroadcastKernelImpl的参数列表中是int block_offset,所以就把这里改成了,不过看到了KP中采用的是uint32_t ,会再提个commit把这里改回来
修改回原始状态uint32_t

* size: The current block needs to load size elements continuously.
*/

#if defined(__NVCC__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个是新加的吗 ?之前的功能有不完善吗 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

功能上已经完成了,此处修改的目的是想从编译阶段区分IsBoundary的两种情况所走入的计算分支,本质是想实现c++ 17中的逻辑

if constexpr (condition) { } 
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

经过检查,这里采用偏特化的修改导致了性能下降问题,故复原回原始写法

Copy link
Contributor

@AnnaTrainingG AnnaTrainingG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JamesLim-sy JamesLim-sy merged commit 89d38f5 into PaddlePaddle:develop Dec 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants