What would LLVM need to do this optimization?

Edward_Lee1 · November 10, 2008, 11:39pm

I have a simple test .ll file that does something like..

int i = 0;
if (argc < 1) i++; else i = foo(i);
if (argc < 2) i++; else i = foo(i);
if (argc < 3) i++; else i = foo(i);
if (argc < 4) i++; else i = foo(i);
return i;

It gets optimized to just..

return 4;

opt -atomic-region-clone -mem2reg -predsimplify -simplifycfg -instcombine

.. where -atomic-region-clone is a pass I've written (-mem2reg to
clean up the PHIs I reg2mem'd). My pass inserts a custom atomic_begin
instruction and a couple intrinsics to help start/rollback a region of
code.

The optimized .ll code is roughly..

entry:
  ; start an atomic region at %atomic, rollback to %original_code
  atomic_begin label %atomic, %original_code
atomic:
  %c = icmp slt i32 %argc, 1
  br i1 %c, label %done, label %abort
done:
  call void @llvm.atomic.end( )
  ret i32 4
abort:
  ; undo all work since atomic_begin and continue at %original_code
  call void @llvm.atomic.abort( )
  unreachable
original_code:
  ; the original compiled method

A few things happening with each optimization pass:
1) -atomic-region-clone duplicates the method and removes untaken
branches (the foo() cases)
2) -predsimplify propagates the "i must be less than 1" information
and gets rid of comparisons
3) -simplifycfg eliminates the newly created "br i1 true" branches
4) -instcombine sees a sequence of 0 + 1 + 1 + 1 + 1 and makes it 4

The optimizations are run on both original_code and the atomic code,
but there's still a huge mess in original_code. (I've attached the
original and optimized output for those curious.) Just for fun, I ran
things with -std-compile-opts and it wasn't much better.

But back to the original question.. what would LLVM need to do this
without the -atomic-region-clone pass that uses hardware support for
rollback. The resulting optimized code doesn't have very much in the
atomic region -- just a compare and branch. (Arguably the compare
could be shared across the atomic and original_code perhaps by PRE.)
So it's not like the hardware is being used much in this case to undo
much/any work at all.

Is there current work towards something that can do this kind of
optimization? Seems like something to do with the profiling interface
for feedback directed optimizations, but last I heard there hasn't
been much activity there.. ?

Ed

Edward_Lee1 · November 10, 2008, 11:53pm

Oops. Because I was silly and forgot to attach the files. I've
additionally attached the intermediate optimized .ll file after
-atomic-region-clone -mem2reg if you want a different way to look at
what the pass does.

Ed

true.ll (1.49 KB)

true.int.ll (4.24 KB)

true.opt.ll (1.85 KB)

Topic		Replies	Views
Unrolling an arithmetic expression inside a loop LLVM Dev List Archives	2	89	November 23, 2010
Plan to optimize atomics in LLVM LLVM Dev List Archives	15	153	August 18, 2014
opt, llcc, ll++, -O1, -O2, -O3 LLVM Dev List Archives	9	115	May 9, 2004
missed optimizations LLVM Dev List Archives	8	127	September 16, 2008
Optimization passes organization and tradeoffs LLVM Dev List Archives	10	168	May 22, 2008

What would LLVM need to do this optimization?

Related topics