Skip to content

Compiled executable fails to launch when built with AVX and LTO enabled #44056

@yvt

Description

@yvt

A generated executable occasionally fails to launch when built with the rustc options -Ctarget-feature=+avx -Copt-level=2 -Clto.

I tried this code:

fn main(){}

Compiled with the following shell script:

#!/bin/sh rustc main.rs -Ctarget-feature=+avx -C opt-level=3 -Clto -g

When I ran the generated executable main repeatedly, the execution of the program stalled (did not terminate nor output anything; did not even enter the main function) 5 out of 100 times.

When I ran the executable from lldb, I could see that EXC_BAD_ACCESS had occured because it attempted to load a 32-byte block from an unaligned memory using vmovdqa (which requires the operand address to be 32-byte aligned).

- thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x0000000100000bf6 main`main + 518 main`main: -> 0x100000bf6 <+518>: vmovdqa (%rax), %ymm0 0x100000bfa <+522>: movl $0x1, %ecx 0x100000bff <+527>: vmovq %rcx, %xmm1 0x100000c04 <+532>: vmovdqa %ymm1, (%rax) (lldb) register read General Purpose Registers: rax = 0x0000000100300470 

Meta

rustc --version --verbose:

rustc 1.21.0-nightly (469a6f9bd 2017-08-22) binary: rustc commit-hash: 469a6f9bd9aef394c5cff6b3bc41b8c520f9515b commit-date: 2017-08-22 host: x86_64-apple-darwin release: 1.21.0-nightly LLVM version: 4.0 

The output of sample (a tool that comes with macOS) when the program is stalled:

Call graph: 2721 Thread_15178881 DispatchQueue_1: com.apple.main-thread (serial) 2721 start (in libdyld.dylib) + 1 [0x7fffa220d235] 2721 0x0 2721 _sigtramp (in libsystem_platform.dylib) + 26 [0x7fffa241cb3a] 2721 std::sys::imp::stack_overflow::imp::signal_handler (in main) + 125 [0x105c58b7d] mem.rs:609 

Analysis

The offending instruction is supposedly a part of libcore::ptr::swap_nonoverlapping_bytes, which is called during the execution of libstd::thread::local::LocalKey::init, which is called when the runtime is being initialized.

#[inline] unsafe fn swap_nonoverlapping_bytes(x: *mut u8, y: *mut u8, len: usize) { // <snip> #[cfg_attr(not(any(target_os = "emscripten", target_os = "redox",  target_endian = "big")),  repr(simd))] struct Block(u64, u64, u64, u64); // <snip> // Swap a block of bytes of x & y, using t as a temporary buffer // This should be optimized into efficient SIMD operations where available copy_nonoverlapping(x, t, block_size); // <--- HERE // <snip> }

After the optimization, this call to the intrinsic function copy_nonoverlapping is translated into the following LLVM instruction:

%t.0.copyload.i.i.i.i.i.i.i.i.i = load <4 x i64>, <4 x i64>* bitcast ({ { { i64, [32 x i8] } }, { { i1 } }, { { i1 } }, [6 x i8] }* @_ZN3std10sys_common11thread_info11THREAD_INFO7__getit5__KEY17h80e4cdc49b84860aE to <4 x i64>*), align 32, !dbg !3742, !noalias !3762 

This is translated into the following x86_64 instruction:

vmovdqa (%rax), %ymm0 

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-codegenArea: Code generationC-bugCategory: This is a bug.I-crashIssue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions