- Notifications
You must be signed in to change notification settings - Fork 308
Closed
Description
Hi,
Not sure if you have plan on upgrading DLRM code to ipex 1.10. I tried to upgrade the DLRM code with ipex 1.10 based on the patch from https://github.com/intel/intel-extension-for-pytorch/blob/0.2/torch_patches/models/0001-enable-dlrm-distributed-training-for-cpu.patch and noticed performance regression.
Micro benchmark showed that all_to_all had 2x worse performance after upgrading ipex 1.10. Any idea?
system config:
- torch ccl 1.10, pytorch 1.10, ipex 1.10
- single node, 2 ranks per node
all2all ipex v0.2:

all2all ipex 1.10:

test code:
import torch import os import extend_distributed as ext_dist if __name__ == "__main__": ext_dist.init_distributed(backend='ccl') inputs = [] tensor1 = torch.ones(262144, 16, dtype=torch.bfloat16) tensor2 = torch.ones(262144, 16, dtype=torch.bfloat16) inputs.append(tensor1) inputs.append(tensor2) with torch.autograd.profiler.profile(True) as prof: for _ in range(10): a2a_req = ext_dist.alltoall(inputs, None) ly_sparse = a2a_req.wait() print(prof.key_averages().table(sort_by="cpu_time_total")) Thanks
Metadata
Metadata
Assignees
Labels
No labels