- Notifications
You must be signed in to change notification settings - Fork 5.9k
Closed
Labels
Description
Some general feedbacks from Nvidia on profiling fluid ConvNet #6179:
- cuDNN convolution is not used(I am not sure whether this is intended). The convolution operator of cuDNN is not exposed in the fluid. #6089
- For profiling, normally we ignore the first minibatch or several minibatch from benchmark result because it is slow on allocating and tuning algorithm. Doing the same thing here allow us to easier compare result to other frameworks to see how well we are doing
- Data pipeline: some part of it is not running in parallel with GPU. plus, it is slow and become the bottleneck if GPU perf gets reasonable
After changing three things above, by using cuDNN, use fake numpy data and only calculate speed for 10-50 minibatch, the TitanXp perf increased from 53img/sec to ~108img/sec
Also, another bug is caught at #6320.
After changing all four things above, we got ~40% speed up to 150img/sec on my Titan.
jacquesqiao, qingqing01, lcy-seso and chengduoZH