I have a simple test .ll file that does something like..
int i = 0;
 if (argc < 1) i++; else i = foo(i);
 if (argc < 2) i++; else i = foo(i);
 if (argc < 3) i++; else i = foo(i);
 if (argc < 4) i++; else i = foo(i);
 return i;
It gets optimized to just..
return 4;
opt -atomic-region-clone -mem2reg -predsimplify -simplifycfg -instcombine
.. where -atomic-region-clone is a pass I've written (-mem2reg to
 clean up the PHIs I reg2mem'd). My pass inserts a custom atomic_begin
 instruction and a couple intrinsics to help start/rollback a region of
 code.
The optimized .ll code is roughly..
entry:
   ; start an atomic region at %atomic, rollback to %original_code
   atomic_begin label %atomic, %original_code
 atomic:
   %c = icmp slt i32 %argc, 1
   br i1 %c, label %done, label %abort
 done:
   call void @llvm.atomic.end( )
   ret i32 4
 abort:
   ; undo all work since atomic_begin and continue at %original_code
   call void @llvm.atomic.abort( )
   unreachable
 original_code:
   ; the original compiled method
A few things happening with each optimization pass:
 1) -atomic-region-clone duplicates the method and removes untaken
 branches (the foo() cases)
 2) -predsimplify propagates the "i must be less than 1" information
 and gets rid of comparisons
 3) -simplifycfg eliminates the newly created "br i1 true" branches
 4) -instcombine sees a sequence of 0 + 1 + 1 + 1 + 1 and makes it 4
The optimizations are run on both original_code and the atomic code,
 but there's still a huge mess in original_code. (I've attached the
 original and optimized output for those curious.) Just for fun, I ran
 things with -std-compile-opts and it wasn't much better.
But back to the original question.. what would LLVM need to do this
 without the -atomic-region-clone pass that uses hardware support for
 rollback. The resulting optimized code doesn't have very much in the
 atomic region -- just a compare and branch. (Arguably the compare
 could be shared across the atomic and original_code perhaps by PRE.)
 So it's not like the hardware is being used much in this case to undo
 much/any work at all.
Is there current work towards something that can do this kind of
 optimization? Seems like something to do with the profiling interface
 for feedback directed optimizations, but last I heard there hasn't
 been much activity there.. ?
Ed