DLRM with ipex 1.10

Hi,
Not sure if you have plan on upgrading DLRM code to ipex 1.10. I tried to upgrade the DLRM code with ipex 1.10 based on the patch from https://github.com/intel/intel-extension-for-pytorch/blob/0.2/torch_patches/models/0001-enable-dlrm-distributed-training-for-cpu.patch and noticed performance regression.
Micro benchmark showed that all_to_all had 2x worse performance after upgrading ipex 1.10. Any idea?

system config:

torch ccl 1.10, pytorch 1.10, ipex 1.10
single node, 2 ranks per node

all2all ipex v0.2:

all2all ipex 1.10:

test code:

import torch import os import extend_distributed as ext_dist if __name__ == "__main__": ext_dist.init_distributed(backend='ccl') inputs = [] tensor1 = torch.ones(262144, 16, dtype=torch.bfloat16) tensor2 = torch.ones(262144, 16, dtype=torch.bfloat16) inputs.append(tensor1) inputs.append(tensor2) with torch.autograd.profiler.profile(True) as prof: for _ in range(10): a2a_req = ext_dist.alltoall(inputs, None) ly_sparse = a2a_req.wait() print(prof.key_averages().table(sort_by="cpu_time_total"))

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DLRM with ipex 1.10 #213

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DLRM with ipex 1.10 #213

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions