Skip to content

elementwise_add_grad should be optimized  #7862

@wanghaoshuang

Description

@wanghaoshuang

elementwise_add_grad should be optimized to avoid the effect of eigen.

-------------------------> Profiling Report <------------------------- Place: CUDA Time unit: ms Sorted by total time in descending order in the same thread Event Calls Total Min. Max. Ave. thread0::elementwise_add_grad 176 7243.75 0.04736 168.678 41.1577 thread0::warpctc 16 7202.06 1.28614 7180.67 450.129 thread0::conv2d_grad 128 752.047 2.50992 17.4375 5.87536 thread0::conv2d 256 491.638 1.08138 4.74362 1.92046 thread0::batch_norm 256 301.695 0.076608 5.48499 1.17849 thread0::gru 64 299.851 4.54563 5.25917 4.68517 thread0::batch_norm_grad 128 287.493 0.343488 6.05075 2.24604 thread0::gru_grad 32 192.225 5.83638 6.24419 6.00702 thread0::elementwise_add 576 89.5377 0.009984 0.579264 0.155447 thread0::relu 256 57.8738 0.062848 0.482432 0.22607 thread0::mul 128 52.3439 0.191296 0.6896 0.408937 thread0::relu_grad 128 40.9168 0.086784 0.685856 0.319662 thread0::pool2d_grad 64 30.8992 0.135456 0.989888 0.4828 thread0::pool2d 128 24.5201 0.057664 0.394848 0.191563 thread0::mul_grad 64 21.6162 0.043584 0.628768 0.337753 thread0::momentum 688 17.9174 0.008928 0.317664 0.0260427 thread0::im2sequence 32 16.2841 0.504608 0.515488 0.508877 thread0::ctc_align 32 16.0577 0.469184 0.665344 0.501804 thread0::warpctc_grad 16 15.68 0.943936 1.09715 0.98 thread0::top_k 32 9.99485 0.218304 1.05782 0.312339 thread0::im2sequence_grad 16 8.83674 0.54448 0.565408 0.552296 thread0::edit_distance 32 8.6968 0.244896 0.601024 0.271775 thread0::sum 112 8.36666 0.019328 0.235168 0.0747023 thread0::scale 256 7.41229 0.00928 0.1416 0.0289542 thread0::clip 224 6.79677 0.009984 0.141408 0.0303427 thread0::cast 36 6.31331 0.048736 0.229376 0.17537 thread0::feed 64 4.81792 0.048608 0.194784 0.07528 thread0::fill_zeros_like 512 4.63562 0.00736 0.010624 0.00905394 thread0::fetch 36 1.31088 0.024704 0.075424 0.0364133 thread0::reduce_sum 32 1.00794 0.026752 0.078976 0.031498 thread0::fill_constant 24 0.594816 0.017376 0.054784 0.024784 thread0::mean 16 0.541664 0.027392 0.0888 0.033854 thread0::elementwise_div 4 0.285696 0.042336 0.099936 0.071424 

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions