Skip to content

Conversation

@liuruyan
Copy link
Contributor

@liuruyan liuruyan commented Apr 13, 2025

PR Category

Performance Optimization

PR Types

Improvements

Description

测试发现 ks = 2 适合于 NCHW layout

avg: h/w=64, c=16, ks=2, NCHW time: 0.322881, NHWC time: 0.276481 h/w=64, c=16, ks=4, NCHW time: 0.315269, NHWC time: 0.267400 h/w=64, c=16, ks=6, NCHW time: 0.401431, NHWC time: 0.278184 h/w=64, c=32, ks=2, NCHW time: 0.325447, NHWC time: 0.496813 h/w=64, c=32, ks=4, NCHW time: 0.448367, NHWC time: 0.380514 h/w=64, c=32, ks=6, NCHW time: 0.630459, NHWC time: 0.514417 h/w=64, c=64, ks=2, NCHW time: 0.505829, NHWC time: 1.189571 h/w=64, c=64, ks=4, NCHW time: 0.741714, NHWC time: 0.827513 h/w=64, c=64, ks=6, NCHW time: 1.107598, NHWC time: 1.158299 h/w=64, c=128, ks=2, NCHW time: 0.856688, NHWC time: 2.207321 h/w=64, c=128, ks=4, NCHW time: 1.376678, NHWC time: 1.541309 h/w=64, c=128, ks=6, NCHW time: 2.109928, NHWC time: 2.206076 h/w=128, c=16, ks=2, NCHW time: 0.565740, NHWC time: 0.722815 h/w=128, c=16, ks=4, NCHW time: 0.834940, NHWC time: 0.548335 h/w=128, c=16, ks=6, NCHW time: 1.198223, NHWC time: 0.745800 h/w=128, c=32, ks=2, NCHW time: 0.847783, NHWC time: 1.612132 h/w=128, c=32, ks=4, NCHW time: 1.382709, NHWC time: 1.151279 h/w=128, c=32, ks=6, NCHW time: 2.134933, NHWC time: 1.682639 h/w=128, c=64, ks=2, NCHW time: 1.604333, NHWC time: 4.435482 h/w=128, c=64, ks=4, NCHW time: 2.663228, NHWC time: 3.057011 h/w=128, c=64, ks=6, NCHW time: 4.045143, NHWC time: 4.304280 h/w=128, c=128, ks=2, NCHW time: 3.035852, NHWC time: 8.518385 h/w=128, c=128, ks=4, NCHW time: 5.156362, NHWC time: 5.948716 h/w=128, c=128, ks=6, NCHW time: 8.138292, NHWC time: 8.653350 h/w=256, c=16, ks=2, NCHW time: 1.805336, NHWC time: 2.501686 h/w=256, c=16, ks=4, NCHW time: 2.898034, NHWC time: 1.794420 h/w=256, c=16, ks=6, NCHW time: 4.396132, NHWC time: 2.661232 h/w=256, c=32, ks=2, NCHW time: 3.015651, NHWC time: 6.210996 h/w=256, c=32, ks=4, NCHW time: 5.158682, NHWC time: 4.517505 h/w=256, c=32, ks=6, NCHW time: 8.198866, NHWC time: 6.722461 h/w=256, c=64, ks=2, NCHW time: 6.205530, NHWC time: 22.979516 h/w=256, c=64, ks=4, NCHW time: 10.481331, NHWC time: 13.238024 h/w=256, c=64, ks=6, NCHW time: 16.565631, NHWC time: 19.431427 h/w=256, c=128, ks=2, NCHW time: 12.794841, NHWC time: 61.503816 h/w=256, c=128, ks=4, NCHW time: 21.239624, NHWC time: 27.016336 h/w=256, c=128, ks=6, NCHW time: 33.395503, NHWC time: 39.011861 max: h/w=64, c=16, ks=2, NCHW time: 0.282521, NHWC time: 0.254951 h/w=64, c=16, ks=4, NCHW time: 0.297331, NHWC time: 0.399692 h/w=64, c=16, ks=6, NCHW time: 0.431402, NHWC time: 0.622851 h/w=64, c=32, ks=2, NCHW time: 0.297867, NHWC time: 0.449826 h/w=64, c=32, ks=4, NCHW time: 0.440076, NHWC time: 0.479654 h/w=64, c=32, ks=6, NCHW time: 0.696664, NHWC time: 0.776529 h/w=64, c=64, ks=2, NCHW time: 0.484779, NHWC time: 0.896816 h/w=64, c=64, ks=4, NCHW time: 0.764992, NHWC time: 0.872529 h/w=64, c=64, ks=6, NCHW time: 1.256456, NHWC time: 1.456123 h/w=64, c=128, ks=2, NCHW time: 0.855667, NHWC time: 1.665445 h/w=64, c=128, ks=4, NCHW time: 1.416875, NHWC time: 1.652546 h/w=64, c=128, ks=6, NCHW time: 2.380813, NHWC time: 2.802986 h/w=128, c=16, ks=2, NCHW time: 0.748676, NHWC time: 0.685990 h/w=128, c=16, ks=4, NCHW time: 1.423692, NHWC time: 1.190178 h/w=128, c=16, ks=6, NCHW time: 2.238465, NHWC time: 2.264225 h/w=128, c=32, ks=2, NCHW time: 1.200919, NHWC time: 1.360761 h/w=128, c=32, ks=4, NCHW time: 2.533128, NHWC time: 1.513031 h/w=128, c=32, ks=6, NCHW time: 4.172381, NHWC time: 2.911870 h/w=128, c=64, ks=2, NCHW time: 2.275403, NHWC time: 3.808403 h/w=128, c=64, ks=4, NCHW time: 4.973764, NHWC time: 2.998720 h/w=128, c=64, ks=6, NCHW time: 8.136851, NHWC time: 5.671319 h/w=128, c=128, ks=2, NCHW time: 4.351615, NHWC time: 8.032251 h/w=128, c=128, ks=4, NCHW time: 9.767418, NHWC time: 5.822402 h/w=128, c=128, ks=6, NCHW time: 16.317776, NHWC time: 11.386560 h/w=256, c=16, ks=2, NCHW time: 2.430539, NHWC time: 2.868182 h/w=256, c=16, ks=4, NCHW time: 5.242343, NHWC time: 4.383137 h/w=256, c=16, ks=6, NCHW time: 8.603761, NHWC time: 8.832022 h/w=256, c=32, ks=2, NCHW time: 4.265874, NHWC time: 17.374619 h/w=256, c=32, ks=4, NCHW time: 9.845596, NHWC time: 5.709220 h/w=256, c=32, ks=6, NCHW time: 16.663032, NHWC time: 11.539901 h/w=256, c=64, ks=2, NCHW time: 8.693331, NHWC time: 46.872488 h/w=256, c=64, ks=4, NCHW time: 19.828383, NHWC time: 11.862891 h/w=256, c=64, ks=6, NCHW time: 33.400356, NHWC time: 23.418960 h/w=256, c=128, ks=2, NCHW time: 17.776977, NHWC time: 119.517524 h/w=256, c=128, ks=4, NCHW time: 39.915949, NHWC time: 24.002529 h/w=256, c=128, ks=6, NCHW time: 67.089656, NHWC time: 47.105176 

Pcard-67164

@paddle-bot
Copy link

paddle-bot bot commented Apr 13, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@liuruyan
Copy link
Contributor Author

/re-run sot

@liuruyan
Copy link
Contributor Author

/re-run all-failed

@zyfncg zyfncg merged commit aa36199 into PaddlePaddle:develop Apr 14, 2025
37 of 40 checks passed
YqGe585 pushed a commit to YqGe585/Paddle that referenced this pull request May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants