Skip to content

Conversation

@Zjq9409
Copy link
Contributor

@Zjq9409 Zjq9409 commented Dec 16, 2021

PR types

Performance optimization

PR changes

OPs

Describe

使用elementwise优化gelu算子GPU前向计算,前向算子性能数据如下:
image

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@Zjq9409 Zjq9409 changed the title relu forward opt Use elementwise to optimize gelu implementation on GPU Dec 16, 2021
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

创建vector时就可以初始化

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Zjq9409 Zjq9409 changed the title Use elementwise to optimize gelu implementation on GPU Use elementwise to optimize gelu forward implementation on GPU Dec 20, 2021
};

template <typename T>
struct GeluNoApproximateFunctor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

名字改一下,跟上面的对应,改成without

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

template <typename DeviceContext, typename T>
typename std::enable_if<
std::is_same<DeviceContext, platform::CUDADeviceContext>::value>::type
default_gelu_fw(const framework::ExecutionContext& ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不需要重新写一个新的函数,直接特化一个CUDA版本的GeluKernel就可以了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

using MT = typename details::MPTypeTrait<T>::Type;
inline HOSTDEVICE T operator()(T x) {
// this function is tanh approximation of gelu
MT mx = static_cast<MT>(x);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的命名可以参考activation_op.cu中的命名方式

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Zjq9409 Zjq9409 changed the title Use elementwise to optimize gelu forward implementation on GPU use elementwise to optimize gelu forward implementation on GPU Dec 20, 2021
@Zjq9409 Zjq9409 force-pushed the gelu_opt branch 2 times, most recently from 107f8ed to dac385f Compare December 20, 2021 11:55
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

特化的话不用改名字,还用GeluKernel就好,DeviceContext这个模版参数用CUDADeviceContext就可以

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@ZzSean ZzSean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZzSean ZzSean merged commit aff4368 into PaddlePaddle:develop Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants