Skip to content

Conversation

@naughtont3
Copy link
Contributor

No description provided.

Signed-off-by: Thomas Naughton <naughtont@ornl.gov> Signed-off-by: Amir Shehata <shehataa@ornl.gov>
@wenduwan
Copy link
Contributor

@naughtont3 I'm curious. We already request the caps here - isn't that enough?

@amirshehataornl
Copy link
Contributor

MCA_BTL_OFI_ONE_SIDED_REQUIRED_CAPS
is only set in the hints.caps

cxi provider checks the tx_attr.caps for these capabilities explicitly. Therefore, if not set then the operation fails.

One update to the patch is to use the define instead of explicitly specifying the caps

Comment on lines +147 to +148
/* Add RMA/ATOMIC for one-sided to work */
tx_attr.caps = (FI_RMA | FI_ATOMIC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to libfabric API

caps - Capabilities
The requested capabilities of the context. The capabilities must be a subset of those requested of the associated endpoint. See the CAPABILITIES section of fi_getinfo(3) for capability details. If the caps field is 0 on input to fi_getinfo(3), the applicable capability bits from the fi_info structure will be used.
The following capabilities apply to the transmit attributes: FI_MSG, FI_RMA, FI_TAGGED, FI_ATOMIC, FI_READ, FI_WRITE, FI_SEND, FI_HMEM, FI_TRIGGER, FI_FENCE, FI_MULTICAST, FI_RMA_PMEM, FI_NAMED_RX_CTX, FI_COLLECTIVE, and FI_XPU.
Many applications will be able to ignore this field and rely solely on the fi_info::caps field. Use of this field provides fine grained control over the transmit capabilities associated with an endpoint. It is useful when handling scalable endpoints, with multiple transmit contexts, for example, and allows configuring a specific transmit context with fewer capabilities than that supported by the endpoint or other transmit contexts.

It reads to me that the intended use case for fi_tx_attr is to scope down capabilities that were set at the endpoint level. But I'm not sure if the PR is motivated by this.

Otherwise I somehow smell a bug in the libfabric provider that requires this change - why would it require setting FI_RMA | FI_ATOMIC again?

Did I miss anything?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree @wenduwan . something's off with the provider that needs this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CXI provider works fine with mpi-onesided without this change.

@amirshehataornl
Copy link
Contributor

It could be that I'm running with older CXI code. With the version of the code I run with. The context is allocated with the caps passed in from MPI fi_tx_context() call. That's 0. Therefore the internal capabilities stored against the cxi transmit context is 0. When fi_read() is called it checks that RMA/ATOMIC capabilities are set and because they are not the fi_read() fails.

I'm looking at the code that they have in a PR in libfabric repo and it's very different than what I have.

How did you verify that the CXI provider works with one-sided? IE which code did you use?

@hppritcha
Copy link
Member

osu 5.8.0 one-sided unit tests on crusher

@hppritcha
Copy link
Member

between two nodes

@amirshehataornl
Copy link
Contributor

ok. i brought the issue that I saw to the attention of the CXI developers. We'll see what they say.

@hppritcha
Copy link
Member

can we close this? This is trying to work around problems in pre HPE SS11 2.2.x releases of CXI provider.

@amirshehataornl
Copy link
Contributor

If we are sure it's something which has already been fixed in cxi provider then I'm good with closing it. I haven't heard back from them about it.

@wenduwan
Copy link
Contributor

Closing due to inactivity. Please reopen if you still need it.

@wenduwan wenduwan closed this Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

4 participants