Skip to content

Fluid distributed training performance is terrible using GPU #8119

@typhoonzero

Description

@typhoonzero

Running vgg16 with cifar10 dataset. Using kubectl to submit a fluid cluster job with 5 pservers and 5 trainers. Trainers request 1 GPU each using alpha.kubernetes.io/nvidia-gpu: 1

CUDA: 8
cuDNN: 5
driver version: 375.26
GPU: P40
HostNetwork

Additional information: I see that CPU usage is up to 100% for a long time in the container, may be the CPU becomes the bottle neck?

Per mini-batch time: around 60s
When CPU only, it's arount 10s.

-------------------------> Profiling Report <------------------------- Time unit: ms Sorted by total time in descending order in the same thread Event Calls Total Min. Max. Ave. thread0::split 5865 1.11765e+07 2.63047 6652.4 1905.63 thread0::concat 5865 1.09052e+07 2.61659 6175.19 1859.36 thread0::send 391 2.29786e+06 327.89 13663.4 5876.87 thread0::conv2d_grad 5083 893.141 0.065567 104.159 0.175711 thread0::conv2d 5083 807.148 0.051993 11.0981 0.158794 thread0::fill_zeros_like 25806 562.583 0.012788 11.0516 0.0218005 thread0::batch_norm 5474 525.538 0.055927 6.09994 0.0960062 thread0::batch_norm_grad 5474 346.792 0.044622 9.09123 0.0633526 thread0::elementwise_add_grad 6256 341.264 0.037377 8.06849 0.0545499 thread0::elementwise_add 6256 295.606 0.024 7.93195 0.0472516 thread0::dropout 3910 191.713 0.033447 6.07088 0.0490315 thread0::pool2d 1955 183.506 0.036702 9.3676 0.0938649 thread0::mul 1173 158.41 0.035665 8.38415 0.135047 thread0::pool2d_grad 1955 151.755 0.041505 8.08374 0.0776243 thread0::relu 5474 143.952 0.015749 5.04012 0.0262974 thread0::dropout_grad 3910 131.019 0.022926 5.03294 0.0335086 thread0::relu_grad 5474 130.262 0.016308 0.196264 0.0237965 thread0::mul_grad 1173 125.795 0.055516 3.11973 0.107242 thread0::cast 782 34.6569 0.019949 0.660779 0.0443183 thread0::softmax 391 33.7846 0.043393 0.707031 0.0864057 thread0::fetch 782 27.9014 0.02143 0.06991 0.0356795 thread0::elementwise_mul 391 22.6289 0.029586 0.658594 0.0578745 thread0::sum 782 21.7157 0.015956 0.057544 0.0277695 thread0::mean 391 20.8824 0.017462 0.680731 0.0534077 thread0::cross_entropy 391 18.9823 0.023271 7.04633 0.0485482 

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions