Skip to content

Conversation

@AshburnLee
Copy link
Contributor

@AshburnLee AshburnLee commented Nov 30, 2021

PR types

Performance optimization

PR changes

OPs

Describe

功能

该PR使用新的接口优化了elementwise_mul的反向计算

opbenchmark 表现

截屏2021-12-30 14 53 57

关于CI-opbenchmark 性能较develop性能变差的3个配置:使用reduce接口优化后,前3个配置较dev下降15%左右;适配多输出代码优化后,该3个配置与dev打平,其他超越(1.85x~12.16x)或打平。整体上,不差于dev和竞品。

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

std::vector<int> reduce_dims = GetReduceDim(x->dims(), out->dims(), axis);
gpuStream_t stream = ctx.cuda_device_context().stream();

framework::Tensor wayto_dx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

变量名最好可以修改一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

wayto_dx.Resize(dout->dims());
default_elementwise_mul<DeviceContext, T>(ctx, dout, y, &wayto_dx);

const framework::Tensor* const_to_dx =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这句可以和下面合并

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

};

template <typename T>
struct MulFunctor<paddle::platform::complex<T>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数可以删除,可以判断当复数形式时从原来的y(y.real, y.imag)构造y_conj(y.real, -y.imag);传入乘法就行

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此函数放在这里确实不合适,与MulFunctor语义冲突。已修改

inline HOSTDEVICE T operator()(const T& a, const T& b) const { return a * b; }
};
template <typename T>
struct MulDxDyFunctor<paddle::platform::complex<T>> {
Copy link
Contributor Author

@AshburnLee AshburnLee Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

函数MulGradDY接受4个参数,如果使用这个函数,就要将调用者改为函数ElemwiseGradCompute,该函数最终调用kernel是优化前的kernel,并且意义不同,此处的功能是elemwiseCompute而非ElemwiseGradCompute。所以我认为,此处复用不了MulGradDY,故提供了MulDxDyFunctor

inline HOSTDEVICE T operator()(const T& a, const T& b) const { return a * b; }
};
template <typename T>
struct MulDxDyFunctor<paddle::platform::complex<T>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

名字可以修改一下,并且提取到elementwise_functor.h公共文件中

std::vector<int> reduce_dims = GetReduceDim(x->dims(), out->dims(), axis);
gpuStream_t stream = ctx.cuda_device_context().stream();

framework::Tensor dx_tmp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议修改命名方式

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. 改为dx_origin_dims,表示reduce之前的dx结果。

template <typename DeviceContext, typename T>
typename std::enable_if<
std::is_same<DeviceContext, platform::CPUDeviceContext>::value>::type
default_elementwise_mul_grad(const framework::ExecutionContext& ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_elementwise_mul_grad和elementwise_mul_grad代码存在重复

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done。


template <typename T>
struct MulDxDyFunctor {
inline HOSTDEVICE T operator()(const T& a, const T& b) const { return a * b; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的参数a,b与下面的参数 x、y不统一

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}
}
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除无效的注释

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

template <typename DeviceContext, typename T>
typename std::enable_if<
std::is_same<DeviceContext, platform::CUDADeviceContext>::value>::type
default_elementwise_mul_grad(const framework::ExecutionContext& ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这部分的代码根据Zjq9409的最新合入PR修改一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done。

dx[col] = plat::complex<double>(y[col].real, -y[col].imag) * o;
dy[col] = plat::complex<double>(x[col].real, -x[col].imag) * o;
col += blockDim.x * gridDim.x;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SimpleElemwiseMulGradCUDAKernel函数代码可以删除

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@JamesLim-sy JamesLim-sy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this pr, if other reviewers also agree with it, then it can be merged.

Copy link
Contributor

@Zjq9409 Zjq9409 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#include "paddle/fluid/operators/elementwise/elementwise_op_broadcast.cu.h" #include "paddle/fluid/platform/complex.h" #include "paddle/fluid/platform/float16.h" 

elementwise_mul_op.cu文件中以上头文件可以删除,可以在下个PR中删掉

@JamesLim-sy JamesLim-sy merged commit 36a102f into PaddlePaddle:develop Jan 5, 2022
@AshburnLee AshburnLee deleted the elem_mul_grad branch January 5, 2022 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants