Skip to content

Device to Device transfers don't work with OpenMPI + LinkX provider on AMD GPUs #13048

@angainor

Description

@angainor

OpenMPI 5.0.6 with shm+cxi:lnx fails to perform Device - Device transfers on LUMI system (AMD GPUs) with OSU benchmark. Host - Host transfers work as expected for intra- and inter-node transfers. For Device - Device transfers OpenMPI fails with

export FI_LNX_PROV_LINKS=shm+cxi mpirun --mca opal_common_ofi_provider_include "shm+cxi:lnx" -np 2 -map-by numa ./osu_bibw -m 131072: D D # OSU MPI-ROCM Bi-Directional Bandwidth Test v7.4 # Datatype: MPI_CHAR. # Size Bandwidth (MB/s) -------------------------------------------------------------------------- Open MPI failed to register your buffer. This error is fatal, your job will abort Buffer Type: rocm Buffer Address: 0x154beaa00000 Buffer Length: 131072 Error: Required key not available (4294967030) -------------------------------------------------------------------------- 

@hppritcha identified the problem to be related to #11076. There was a fix for this issue in #12290, but it was not merged to the 5.x branch.

Metadata

Metadata

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions