Skip to content

Conversation

steffenlarsen
Copy link
Contributor

This commit adds a new extension for inter-process communicable SYCL object handles. As part of the initial version of this extension, only inter-process communicable memory is exposed.

@steffenlarsen steffenlarsen force-pushed the ext_oneapi_inter_process_communication branch from cae8ee3 to 14415fe Compare September 10, 2025 08:16
@steffenlarsen steffenlarsen force-pushed the ext_oneapi_inter_process_communication branch from 14415fe to 89f495c Compare September 10, 2025 11:03
@steffenlarsen steffenlarsen force-pushed the ext_oneapi_inter_process_communication branch from 89f495c to 5f443ab Compare September 10, 2025 12:03
@steffenlarsen steffenlarsen force-pushed the ext_oneapi_inter_process_communication branch from 5f443ab to b7d2052 Compare September 10, 2025 12:53
@steffenlarsen steffenlarsen force-pushed the ext_oneapi_inter_process_communication branch from b7d2052 to 07f4355 Compare September 10, 2025 15:57
@steffenlarsen steffenlarsen force-pushed the ext_oneapi_inter_process_communication branch from 07f4355 to 7364b75 Compare September 11, 2025 05:40
This commit adds a new extension for inter-process communicable SYCL object handles. As part of the initial version of this extension, only inter-process communicable memory is exposed. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
@steffenlarsen
Copy link
Contributor Author

@intel/dpcpp-tools-reviewers & @intel/llvm-reviewers-runtime - This should be ready for review!

Copy link
Contributor

@againull againull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation LGTM overall

Copy link
Contributor

@YuriPlyakhin YuriPlyakhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No flags from tools perspective (just one nit).

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
* An exception with the `errc::feature_not_supported` error code if device
`dev` does not have `aspect::ext_oneapi_ipc_memory`.
* An exception with the `errc::invalid` error code if the handle data
`handle_data` has an unexpected number of bytes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the number of bytes is correct, but the content is garbage? Can we broaden this exception like this:

  • An exception with the errc::invalid error code if the handle data handle_data does not represent a valid IPC handle for USM memory on this host system.

That would also cover the case when the handle had the wrong number of bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am reluctant to promise an error code if handle data is corrupt. The byte size is something we can check at a SYCL/UR level, but the validity of the data is up to UMF to decide and the API documentation doesn't specify which error is returned if the data is off. Maybe a precondition?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should push this requirement down to the UMF and down to Level Zero if necessary. They should be able to define an error code for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will open an issue against UMF, but I think it might be tricky. What happens if the data looks correct to UMF, but using it causes UB or invalid access in the implementation? I suppose that's something they can look into though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UMF shouldn't look at the handle content at all. It should just pass it to the underlying driver API, and that API should return an error code if the handle content is invalid. This assumes that each driver is able to diagnose an error in such a case, but that seems quite reasonable to me. For example, cudaIpcOpenMemHandle is defined to return cudaErrorInvalidResourceHandle in this case. I don't see an error code listed for Level Zero, but that seems like a bug. They should be able to return an error code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be some wrapper information in the UMF IPC handle, but I don't know the exact details. Maybe @vinser52 has some insight into this?

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
[_Note:_ The `open` function can be called multiple times on the same handle
within the same process. The number of calls to the `close` function must be
equal to the number of calls to the `open` function to free the memory pointer.
_{endnote}_]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While looking at the Level Zero documentation, I saw that it says this:

Multiple calls to this function with the same IPC handle will return unique pointers

Therefore, I think we should change this note to say:

[Note: The open function can be called multiple times on the same handle
within the same process. Each call to open may return a unique pointer value
even for the same handle, therefore each call to open must have a matching call
to close.
{endnote}]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinser52 - Does UMF let the calls go through to Level Zero each time, or will repeat calls to open return the same UMF handle?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter what UMF currently does? Even if UMF "squashes" the repeated zeMemOpenIpcHandle calls now, it might not in the future. The proposed wording gives us the freedom to implement it either way.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

7 participants