Skip to content

(Windows) ARM division libcall handling broken - result registers clobbered? #33454

@mstorsjo

Description

@mstorsjo
Bugzilla Link 34107
Resolution FIXED
Resolved on Aug 24, 2017 14:05
Version 5.0
OS All
Blocks #33196
Attachments A C code snippet of the relevant piece of code, LLVM IR of the code snippet that triggers the issue, LLVM MIR from compiling the test piece of code
CC @compnerd,@rovka,@efriedma-quic,@zmodem,@MatzeB,@qcolombet,@rengolin,@smithp35

Extended Description

Since SVN r305625, libcalls to e.g. __rt_udiv64 for 64 bit division on Windows on ARM, which place their results in r0-r1 (and the remainder in r2-r3), can occasionally get broken.

I've bisected this regression down to the following commit:

RegScavenging: Add scavengeRegisterBackwards() Re-apply r276044/r279124/r305516. Fixed a problem where we would refuse to place spills as the very first instruciton of a basic block and thus artifically increase pressure (test in test/CodeGen/PowerPC/scavenging.mir:spill_at_begin) This is a variant of scavengeRegister() that works for enterBasicBlockEnd()/backward(). The benefit of the backward mode is that it is not affected by incomplete kill flags. This patch also changes PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register scavenger in backwards mode. Differential Revision: http://reviews.llvm.org/D21885 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305625 91177308-0d3 

4-0410-b5e6-96231b3b80d8

So far I've been able to reproduce the issue with a pretty big file.

An example of the output of a code snippet which I think might be part of the issue I'm seeing is this:

3136: 00 f0 00 f8 bl #&#8203;0 00003136: IMAGE_REL_ARM_BLX23T __rt_udiv64 <-- output in r0-r1 313a: 0d f5 86 61 add.w r1, sp, #&#8203;1072 <-- clobbering r1 with an unrelated pointer 313e: 4f f0 01 0e mov.w lr, #&#8203;1 3142: 01 f5 89 2c add.w r12, r1, #&#8203;280576 3146: 11 eb d0 72 adds.w r2, r1, r0, lsr #&#8203;31 <-- using r1 from the __rt_udiv64 call 

The diff in the generated code for this segment from before and after this commit is as follows:

 00 f0 00 f8 bl #&#8203;0 IMAGE_REL_ARM_BLX23T __rt_udiv64 IMAGE_REL_ARM_BLX23T __rt_udiv64 
  •  0d f5 86 61 add.w r1, sp, #&#8203;1072 4f f0 01 0e mov.w lr, #&#8203;1 
  •  cd f8 00 e0 str.w lr, [sp] 
  •  0d f5 86 6e add.w lr, sp, #&#8203;1072 
  •  01 f5 89 2c add.w r12, r1, #&#8203;280576 11 eb d0 72 adds.w r2, r1, r0, lsr #&#8203;31 16 bf itet ne 

Thus, this clearly looks broken.

The output from the __rt_udiv64 call gets passed to the following function:

static av_always_inline av_const int32_t av_clipl_int32_arm(int64_t a)
{
int x, y;
asm ("adds %1, %R2, %Q2, lsr #​31 \n\t"
"itet ne \n\t"
"mvnne %1, #​1<<31 \n\t"
"moveq %0, %Q2 \n\t"
"eorne %0, %1, %R2, asr #​31 \n\t"
: "=r"(x), "=&r"(y) : "r"(a) : "cc");
return x;
}

Is this an issue with the division libcall itself, missing to flag that all of these registers actually are used? Or is the inline assembly somehow losing track of that both halves of the 64 bit variable are used? (When used as input to the inline assembly snippet where it is passed with a "r" type, used via the %R2/%Q2 names.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions