Skip to content

Conversation

@naughtont3
Copy link
Contributor

No description provided.

Signed-off-by: Thomas Naughton <naughtont@ornl.gov> Signed-off-by: Amir Shehata <shehataa@ornl.gov>
@wenduwan
Copy link
Contributor

Running AWS CI

@hppritcha
Copy link
Member

@janjust could someone from nvidia check out what's going on with nvidia ci?

@hppritcha
Copy link
Member

@wenduwan someone should check if this change is going to work with EFA provider

@hppritcha hppritcha requested a review from wenduwan January 30, 2024 19:23
@wenduwan
Copy link
Contributor

This change broke AWS CI. I need to understand the root cause.

@hppritcha
Copy link
Member

i think this PR is incorrect for several providers.

@hppritcha
Copy link
Member

@BrendanCunningham could you check if this works with OPX provider?

@BrendanCunningham
Copy link
Member

BrendanCunningham commented Jan 30, 2024

@BrendanCunningham could you check if this works with OPX provider?

@tmh97 Can you take a look at this?

@hppritcha I don't work on OPX myself; I maintain PSM2. Thomas Huber works on OPX.

@wenduwan
Copy link
Contributor

I figured the problem with EFA provider is that it requries FI_MR_PROV_KEY in mr_mode. Therefore this change effectively disabled EFA in btl/ofi.

That said, it's not clear to me how to get what you want. I need to ask the libfabric community.

@wenduwan
Copy link
Contributor

@amirshehataornl @naughtont3 Can you attest to ofiwg/libfabric#9777 ? Is that what you want?

@j-xiong
Copy link

j-xiong commented Feb 1, 2024

What is the intention of this change? Does the ofi btl now always wants to supply the mr keys instead of letting the provider choose the key? If it can work either way the mode bit should remain. Providers that don't require the mode will clear the flag and MPI should check the flag returned from fi_getinfo() to determine how the keys should be supplied.

@hppritcha
Copy link
Member

i see no evidence this is needed by the CXI provider and since it breaks support for EFA not sure we could consider merging this, certainly not as is.

@wenduwan
Copy link
Contributor

Closing PR due to inactivity. Please reopen if you still need it.

@wenduwan wenduwan closed this Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

5 participants