Skip to content

Conversation

@lslusarczyk
Copy link
Contributor

@lslusarczyk lslusarczyk commented Jul 2, 2025

By default, the Level Zero adapter makes all device allocations resident on all "peer" devices, even if the application doesn't use those peer devices at all. USM residency adds significant overhead.

For v2 unfied-runtime adapter we switch in this PR to not calling zeContextMakeMemoryResident on all devices in the context, just the one that was specified during allocation and those specified using ext_oneapi_enable_peer_access.

When ext_oneapi_enable_peer_access / ext_oneapi_disable_peer_access is called, we retroactively call zeContextMakeMemoryResident or zeContextEvictMemory respectively, on all the active pages in the heap. This is done by calling ctl call in UMF drafted here: oneapi-src/unified-memory-framework@e6d5043

UMF part was done in oneapi-src/unified-memory-framework#1517

ZeOffsetToImageHandleMap;

// devices which user enabled p2p access by urUsmP2P(Enable|Disable)PeerAccessExp
std::unordered_set<DeviceId> p2pDeviceIds;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not a vector?

{
std::scoped_lock<ur_shared_mutex> Lock(Platform->ContextsMutex);
auto It = std::find(Contexts.begin(), Contexts.end(), hContext);
UR_ASSERT(It != Contexts.end(), UR_RESULT_ERROR_INVALID_CONTEXT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this happen? should this be just a regular assert?


void cleanupPools();
void cleanupPoolsForQueue(void *hQueue);
void addResidentDevice(ur_device_handle_t hDevice, ur_device_handle_t peerDevice);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this function called from?

continue;
}

if (!p2pDevicesEnabledByUser.has_value())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this outside of the loop?

{
std::scoped_lock<ur_shared_mutex> Lock(Platform->ContextsMutex);
for (auto Context : Platform->Contexts) {
Context->addResidentDevice(commandDevice, peerDevice);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if some context does not contain commandDevice?


assert(0 = std::count_if(
std::begin(pDevices), std::end(pDevices),
[&](const auto pDevice) { return newPeerDevice->Id == pDevice->Id; }));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuCtxEnablePeerAccess returns CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED. Could we have this error mapped to a UR error? It would help make the peer-to-peer SYCL API more consistent (See #19787.)

sarnex pushed a commit that referenced this pull request Oct 10, 2025
github-actions bot pushed a commit to oneapi-src/unified-runtime that referenced this pull request Oct 14, 2025
@lukaszstolarczuk
Copy link
Contributor

FYI, when you start requiring the new function from UMF v1.1.x please remember to update min. required version -> find_package(umf 1.0.0 QUIET) should be changed to find_package(umf 1.1.0 QUIET)

ref. #20334

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants