- Notifications
You must be signed in to change notification settings - Fork 13.9k
Description
A generated executable occasionally fails to launch when built with the rustc options -Ctarget-feature=+avx -Copt-level=2 -Clto
.
I tried this code:
fn main(){}
Compiled with the following shell script:
#!/bin/sh rustc main.rs -Ctarget-feature=+avx -C opt-level=3 -Clto -g
When I ran the generated executable main
repeatedly, the execution of the program stalled (did not terminate nor output anything; did not even enter the main
function) 5 out of 100 times.
When I ran the executable from lldb
, I could see that EXC_BAD_ACCESS
had occured because it attempted to load a 32-byte block from an unaligned memory using vmovdqa
(which requires the operand address to be 32-byte aligned).
- thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x0000000100000bf6 main`main + 518 main`main: -> 0x100000bf6 <+518>: vmovdqa (%rax), %ymm0 0x100000bfa <+522>: movl $0x1, %ecx 0x100000bff <+527>: vmovq %rcx, %xmm1 0x100000c04 <+532>: vmovdqa %ymm1, (%rax) (lldb) register read General Purpose Registers: rax = 0x0000000100300470
Meta
rustc --version --verbose
:
rustc 1.21.0-nightly (469a6f9bd 2017-08-22) binary: rustc commit-hash: 469a6f9bd9aef394c5cff6b3bc41b8c520f9515b commit-date: 2017-08-22 host: x86_64-apple-darwin release: 1.21.0-nightly LLVM version: 4.0
The output of sample
(a tool that comes with macOS) when the program is stalled:
Call graph: 2721 Thread_15178881 DispatchQueue_1: com.apple.main-thread (serial) 2721 start (in libdyld.dylib) + 1 [0x7fffa220d235] 2721 0x0 2721 _sigtramp (in libsystem_platform.dylib) + 26 [0x7fffa241cb3a] 2721 std::sys::imp::stack_overflow::imp::signal_handler (in main) + 125 [0x105c58b7d] mem.rs:609
Analysis
The offending instruction is supposedly a part of libcore::ptr::swap_nonoverlapping_bytes
, which is called during the execution of libstd::thread::local::LocalKey::init
, which is called when the runtime is being initialized.
#[inline] unsafe fn swap_nonoverlapping_bytes(x: *mut u8, y: *mut u8, len: usize) { // <snip> #[cfg_attr(not(any(target_os = "emscripten", target_os = "redox", target_endian = "big")), repr(simd))] struct Block(u64, u64, u64, u64); // <snip> // Swap a block of bytes of x & y, using t as a temporary buffer // This should be optimized into efficient SIMD operations where available copy_nonoverlapping(x, t, block_size); // <--- HERE // <snip> }
After the optimization, this call to the intrinsic function copy_nonoverlapping
is translated into the following LLVM instruction:
%t.0.copyload.i.i.i.i.i.i.i.i.i = load <4 x i64>, <4 x i64>* bitcast ({ { { i64, [32 x i8] } }, { { i1 } }, { { i1 } }, [6 x i8] }* @_ZN3std10sys_common11thread_info11THREAD_INFO7__getit5__KEY17h80e4cdc49b84860aE to <4 x i64>*), align 32, !dbg !3742, !noalias !3762
This is translated into the following x86_64 instruction:
vmovdqa (%rax), %ymm0