Skip to content

Commit cf97b03

Browse files
authored
mm: guard against double pin and unpin explicitly (comfyanonymous#10672)
As commented, if you let cuda be the one to detect double pin/unpinning it actually creates an asyc GPU error.
1 parent eb1c42f commit cf97b03

File tree

1 file changed

+12
-0
lines changed

1 file changed

+12
-0
lines changed

comfy/model_management.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1103,6 +1103,12 @@ def pin_memory(tensor):
11031103
if not is_device_cpu(tensor.device):
11041104
return False
11051105

1106+
if tensor.is_pinned():
1107+
#NOTE: Cuda does detect when a tensor is already pinned and would
1108+
#error below, but there are proven cases where this also queues an error
1109+
#on the GPU async. So dont trust the CUDA API and guard here
1110+
return False
1111+
11061112
size = tensor.numel() * tensor.element_size()
11071113
if (TOTAL_PINNED_MEMORY + size) > MAX_PINNED_MEMORY:
11081114
return False
@@ -1123,6 +1129,12 @@ def unpin_memory(tensor):
11231129
if not is_device_cpu(tensor.device):
11241130
return False
11251131

1132+
if not tensor.is_pinned():
1133+
#NOTE: Cuda does detect when a tensor is already pinned and would
1134+
#error below, but there are proven cases where this also queues an error
1135+
#on the GPU async. So dont trust the CUDA API and guard here
1136+
return False
1137+
11261138
ptr = tensor.data_ptr()
11271139
if torch.cuda.cudart().cudaHostUnregister(ptr) == 0:
11281140
TOTAL_PINNED_MEMORY -= PINNED_MEMORY.pop(ptr)

0 commit comments

Comments
 (0)