"accelerate elementwise_add_grad, add reduce functor" #7961

dzhwinter · 2018-01-30T02:24:59Z

tonyyang-svail · 2018-02-01T05:05:26Z

@dzhwinter could you briefly explain why the original version is slow?

dzhwinter · 2018-02-02T11:37:40Z

The previous version use broadcast, it is a quite low efficient operation.
After enhance, the result shows, the elementwise_add_grad lower than convolution.

-------------------------> Profiling Report <------------------------- Place: CPU	Total Time:41762.5ms	Total Memory:17689.2MB	Sorted by total time in descending order in the same thread Event Calls Total Min. Max. Ave. Total Memory.Min Memory. Max Memory. thread0::conv2d_grad 13 15602.1 276.188 5248.83 1200.16 9069.18 0.0078125 392.148 thread0::conv2d 13 8975.5 219.889 3366.09 690.423 338.152 12.2539 392.004 thread0::dropout 10 3036.16 0.300329 1213.68 303.616 1906.18 0.132812 784.008 thread0::elementwise_add_grad 16 2279.28 0.030858 544.775 142.455 8960.14 0.0195312 392.008 thread0::batch_norm_grad 14 1926.31 0.390424 462.51 137.594 8961.79 0.0742188 392.012 thread0::relu_grad 14 1689.34 0.087101 397.12 120.667 8961.71 0.0664062 392.004 thread0::pool2d_grad 5 1247.38 22.8108 617.521 249.475 9020.14 12.2539 392.004 thread0::batch_norm 14 1125.46 0.435152 264.698 80.39 1122.16 0.0742188 392.012 thread0::elementwise_add 16 991.63 0.031354 248.922 61.9769 730.156 0.015625 392.004 thread0::dropout_grad 10 808.702 0.048284 379.893 80.8702 8961.64 0.0664062 392.004 thread0::relu 14 795.466 0.042214 189.24 56.819 1514.17 0.0664062 392.004 thread0::adam 60 516.043 0.013363 238.108 8.60072 17688.7 0 0 thread0::fill_zeros_like 66 445.392 0.003328 185.057 6.74837 8961.57 0.00390625 392.004 thread0::pool2d 5 342.792 7.17049 180.343 68.5584 4258.21 3.06641 98.0039 thread0::mul_grad 3 74.9344 0.593411 71.8969 24.9781 8960.16 0.269531 52.0703 thread0::mul 3 44.4806 0.180829 43.4023 14.8269 8959.48 0.015625 0.0664062 thread0::elementwise_mul 60 0.363544 0.004298 0.022641 0.00605907 17688.7 0.00390625 0.00390625 thread0::softmax 1 0.278477 0.278477 0.278477 0.278477 8960.05 0.015625 0.015625 thread0::fill_constant 61 0.260437 0.002431 0.025904 0.00426946 8960.11 0.00390625 0.00390625

dzhwinter · 2018-02-24T05:44:59Z

This solution involves in the reduce primitive, its CPU version implement is quite easy, as the test result shown above, which demonstrate the performance well. But the GPU kernel is hard to implement because of GPU threads overwriting problem.

I had dig into the reduce kernel in https://github.com/zchee/cuda-sample/blob/master/6_Advanced/reduction/reduction_kernel.cu , but in my local machine, the reduce kernel is still not implemented correctly, so this PR has been delayed such a long time.

Currently, this issue partly has been fixed in without using the reduce kernel. Please check the detail in #8402

paddle-bot-old · 2020-05-22T06:40:10Z

Since you haven't replied for a long time, we have closed this issue/pr.
If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up.
由于您长期未回复，我们将关闭这个issue/pr。
若问题未解决或有后续问题，请随时重新打开，我们会继续跟进。

dzhwinter added 2 commits January 30, 2018 00:38

"add reduce functor"

f13a2d6

"fix compile"

d993802

dzhwinter changed the title ~~"add reduce functor"~~ "accelerate elementwise_add_grad, add reduce functor" Jan 30, 2018

dzhwinter added 4 commits February 1, 2018 22:01

"test cpu speed"

43638ee

Merge remote-tracking branch 'origin/develop' into enhance/elememnt

d6b2363

try make copy in thrust binary op

769e84d

"try to use wrapper of iterator"

0d30456

paddle-bot-old bot closed this May 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

"accelerate elementwise_add_grad, add reduce functor" #7961

"accelerate elementwise_add_grad, add reduce functor" #7961

Uh oh!

dzhwinter commented Jan 30, 2018

tonyyang-svail commented Feb 1, 2018

dzhwinter commented Feb 2, 2018 •

edited

Loading

dzhwinter commented Feb 24, 2018 •

edited

Loading

paddle-bot-old bot commented May 22, 2020

Labels

2 participants

Uh oh!

"accelerate elementwise_add_grad, add reduce functor" #7961

"accelerate elementwise_add_grad, add reduce functor" #7961

Uh oh!

Conversation

dzhwinter commented Jan 30, 2018

tonyyang-svail commented Feb 1, 2018

dzhwinter commented Feb 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

dzhwinter commented Feb 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

paddle-bot-old bot commented May 22, 2020

Labels

2 participants

dzhwinter commented Feb 2, 2018 •

edited

Loading

dzhwinter commented Feb 24, 2018 •

edited

Loading