- Notifications
You must be signed in to change notification settings - Fork 256
Description
I am trying to use llama.cpp with SYCL and when running with default settings I'm getting a "Bus error" (SIGBUS) when loading models:
$ ./bin/llama-bench -m models/phi-4-Q3_K_M.gguf WARNING: Small BAR detected for device 0000:03:00.0 | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | Bus error (core dumped) ./bin/llama-bench -m models/phi-4-Q3_K_M.gguf That is using the level_zero device by default. When using the OpenCL version using ONEAPI_DEVICE_SELECTOR the code works fine:
$ ONEAPI_DEVICE_SELECTOR=opencl:gpu ./bin/llama-bench -m models/phi-4-Q3_K_M.gguf WARNING: Small BAR detected for device 0000:03:00.0 | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | llama 13B Q3_K - Medium | 6.69 GiB | 14.66 B | SYCL | 99 | pp512 | 333.50 ± 20.98 | ... I'm aware of the "small bar" warning as I'm running on older hardware (Asus Z170-A + Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz) with Arc A750:
$ sycl-ls WARNING: Small BAR detected for device 0000:03:00.0 WARNING: Small BAR detected for device 0000:03:00.0 [level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) A750 Graphics 12.55.8 [1.6.34666] [opencl:fpga][opencl:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2024.18.12.0.05_160000] [opencl:cpu][opencl:1] Intel(R) OpenCL, Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz OpenCL 3.0 (Build 0) [2024.18.12.0.05_160000] [opencl:gpu][opencl:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A750 Graphics OpenCL 3.0 NEO [25.31.34666] I'm using Arch Linux and the latest version of intel-compute-runtime I could find:
$ uname -a Linux hostname 6.16.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 15 Aug 2025 16:04:43 +0000 x86_64 GNU/Linux $ pacman -Q intel-compute-runtime intel-compute-runtime 25.31.34666.3-1 Some more crash details with GDB:
$ gdb --args ./bin/llama-bench -m models/phi-4-Q3_K_M.gguf ... Thread 1 "llama-bench" received signal SIGBUS, Bus error. (gdb) bt full #0 0x00007ffff636e087 in ?? () from /usr/lib/libc.so.6 No symbol table info available. #1 0x00007fffdcab128c in memcpy_s (dst=0x7ffcfaa60000, destSize=<optimized out>, src=0x5d556b0, count=<optimized out>) at /src/arch/intel-compute-runtime/src/compute-runtime-25.31.34666.3/shared/source/helpers/string.h:71 No locals. #2 L0::CommandListCoreFamilyImmediate<(GFXCORE_FAMILY)3079>::performCpuMemcpy (this=this@entry=0x5d5a6c0, cpuMemCopyInfo=..., hSignalEvent=hSignalEvent@entry=0x2cd9e18, numWaitEvents=numWaitEvents@entry=0, phWaitEvents=phWaitEvents@entry=0x0) at /src/arch/intel-compute-runtime/src/compute-runtime-25.31.34666.3/level_zero/core/source/cmdlist/cmdlist_hw_immediate.inl:1444 lockingFailed = false srcLockPointer = <optimized out> dstLockPointer = <optimized out> signalEvent = 0x2cd9e10 cpuMemcpySrcPtr = 0x5d556b0 cpuMemcpyDstPtr = 0x7ffcfaa60000 #3 0x00007fffdcabf240 in L0::CommandListCoreFamilyImmediate<(GFXCORE_FAMILY)3079>::appendMemoryCopy (this=0x5d5a6c0, dstptr=0xffffd556aaa00000, srcptr=0x5d556b0, size=20480, hSignalEvent=0x2cd9e18, numWaitEvents=0, phWaitEvents=0x0, memoryCopyParams=...) at /src/arch/intel-compute-runtime/src/compute-runtime-25.31.34666.3/level_zero/core/source/cmdlist/cmdlist_hw_immediate.inl:683 estimatedSize = <optimized out> hasStallindCmds = false ret = <optimized out> cpuMemCopyInfo = {dstPtr = 0xffffd556aaa00000, srcPtr = 0x5d556b0, size = 20480, dstAllocData = 0x5c78bc0, srcAllocData = 0x0, dstIsImportedHostPtr = false, srcIsImportedHostPtr = false} direction = 32767 isSplitNeeded = <optimized out> #4 0x00007fffdc8e35f2 in L0::zeCommandListAppendMemoryCopy (hCommandList=<optimized out>, dstptr=<optimized out>, srcptr=<optimized out>, size=20480, hSignalEvent=0x2cd9e18, numWaitEvents=<optimized out>, phWaitEvents=0x0) at /src/arch/intel-compute-runtime/src/compute-runtime-25.31.34666.3/level_zero/api/core/ze_copy_api_entrypoints.h:32 cmdList = 0x5d5a6c0 ret = ZE_RESULT_ERROR_NOT_AVAILABLE memoryCopyParams = {relaxedOrderingDispatch = false, forceDisableCopyOnlyInOrderSignaling = false, copyOffloadAllowed = false} #5 0x00007fffee78237b in enqueueMemCopyHelper(ur_command_t, ur_queue_handle_legacy_t_*, void*, unsigned char, unsigned long, void const*, unsigned int, ur_event_handle_t_* const*, ur_event_handle_t_**, bool) () from /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_level_zero.so.0 No symbol table info available. #6 0x00007fffee78bf87 in ur_queue_handle_legacy_t_::enqueueUSMMemcpy(bool, void*, void const*, unsigned long, unsigned int, ur_event_handle_t_* const*, ur_event_handle_t_**) () from /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_level_zero.so.0 No symbol table info available. #7 0x00007fffe0cf4db7 in ur_loader::urEnqueueUSMMemcpy(ur_queue_handle_t_*, bool, void*, void const*, unsigned long, unsigned int, ur_event_handle_t_* const*, ur_event_handle_t_**) () from /opt/intel/oneapi/compiler/2025.0/lib/libur_loader.so.0 No symbol table info available. #8 0x00007fffe0d07eff in urEnqueueUSMMemcpy () from /opt/intel/oneapi/compiler/2025.0/lib/libur_loader.so.0 No symbol table info available. #9 0x00007fffe1c45aa9 in sycl::_V1::detail::MemoryManager::copy_usm(void const*, std::shared_ptr<sycl::_V1::detail::queue_impl>, unsigned long, void*, std::vector<ur_event_handle_t_*, std::allocator<ur_event_handle_t_*> >, ur_event_handle_t_**, std::shared_ptr<sycl::_V1::detail::event_impl> const&) () from /opt/intel/oneapi/compiler/2025.0/lib/libsycl.so.8 No symbol table info available. #10 0x00007fffe1c8b83d in sycl::_V1::detail::queue_impl::memcpy(std::shared_ptr<sycl::_V1::detail::queue_impl> const&, void*, void const*, unsigned long, std::vector<sycl::_V1::event, std::allocator<sycl::_V1::event> > const&, bool, sycl::_V1::detail::code_location const&) () from /opt/intel/oneapi/compiler/2025.0/lib/libsycl.so.8 No symbol table info available. #11 0x00007fffe1d36421 in sycl::_V1::queue::memcpy(void*, void const*, unsigned long, sycl::_V1::detail::code_location const&) () from /opt/intel/oneapi/compiler/2025.0/lib/libsycl.so.8 No symbol table info available. #12 0x00007ffff6a37f9d in ggml_backend_sycl_buffer_set_tensor(ggml_backend_buffer*, ggml_tensor*, void const*, unsigned long, unsigned long)::{lambda()#2}::operator()() const (this=<optimized out>) at /src/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:399 e = <optimized out> (gdb) list ... 1444 memcpy_s(cpuMemcpyDstPtr, cpuMemCopyInfo.size, cpuMemcpySrcPtr, cpuMemCopyInfo.size); ... (gdb) p cpuMemCopyInfo $7 = (const L0::CpuMemCopyInfo &) @0x7fffffffaea0: {dstPtr = 0xffffd556aaa00000, srcPtr = 0x5d556b0, size = 20480, dstAllocData = 0x5c78bc0, srcAllocData = 0x0, dstIsImportedHostPtr = false, srcIsImportedHostPtr = false} (gdb) p cpuMemcpyDstPtr $8 = (void *) 0x7ffcfaa60000 (gdb) info proc mappings Mapped address spaces: Start Addr End Addr Size Offset Perms File ... 0x00000000004d5000 0x0000000005d6a000 0x5895000 0x0 rw-p [heap] 0x00007ffcfaa60000 0x00007ffdf9000000 0xfe5a0000 0x1e929a000 rw-s anon_inode:i915.gem 0x00007ffdf9000000 0x00007fffa59b7000 0x1ac9b7000 0x0 r--s /data/llama-models/phi-4-Q3_K_M.gguf 0x00007fffa5a00000 0x00007fffa5a3f000 0x3f000 0x0 r--p /usr/lib/libopencl-clang.so.15 ... While I understand the issue might be due to the "small BAR" error, I would appreciate a helpful error message rather than a SIGBUS that requires rebuilding the intel-compute-runtime with debug symbols to understand where the issue is coming from. Even better - make level_zero work with small BAR, even if with reduced performance.