I am involved in a project where one of the aims is to study the effects of different parallelization strategies on efficiency, power consumption etc. The idea is to do automated design space exploration by varying some parameters (e.g. number of tasks) and measuring their effect.
Since we are already using LLVM for other purposes, we thought about using LLVM for analysis and then OpenMP for compilation. The idea was to use the LLVM back end to spit out C code with OpenMP directives. However, looking at the C code that llc produces, it seems that this might be a non-starter, as loops have already been turned into gotos in the generated C.
I suspect we may need to do something else, but if anyone has any bright ideas on how to use LLVM for this purpose, I'd be very grateful.
I am involved in a project where one of the aims is to study the effects of different parallelization strategies on efficiency, power consumption etc. The idea is to do automated design space exploration by varying some parameters (e.g. number of tasks) and measuring their effect.
Since we are already using LLVM for other purposes,
we thought about using LLVM for analysis and then OpenMP for compilation. The idea was to use the LLVM back end to spit out C code with OpenMP directives. However, looking at the C code that llc produces, it seems that this might be a non-starter, as loops have already been turned into gotos in the generated C.
I suspect we may need to do something else, but if anyone has any bright ideas on how to use LLVM for this purpose, I'd be very grateful.
If you're using llvm-gcc-4.0 to get llvm bitcode for analysis then front-end is ignoring OpenMP pragmas. if you're trying to use llvm-gcc-4.2 then most likely it won't work, because llvm bitcode emitter has not been updated to handle OpenMP. If you're getting llvm bitcode (without going through llvm-gcc) and somehow annotating it with OpenMP instructions (I don't know how) then also I CBackend, which is used by llc, won't be able to handle it at the moment. In simple words, LLVM is not OpenMP ready yet.
The easiest route would be to update llvm bitcode converter in llvm-gcc-4.2 such that llvm bitcode converter operates on GCC trees after OpenMP lowering phase is executed.
If you're interested, we welcome contribution on OpenMP front.
The easiest route would be to update llvm bitcode converter in llvm- gcc-4.2 such that llvm bitcode converter operates on GCC trees after OpenMP lowering phase is executed.
If you're interested, we welcome contribution on OpenMP front.
I already slightly looked on OpenMP in llvm-gcc 4.2.
Internally OpenMP lowering in gcc is split into two phases. Funny, but llvm converter is run between these phases It looks like second phase uses some gcc analysis passes (like dominance info, etc).
In order to support OpenMP in llvm-gcc 4.2 we should: 1. Decide at which level we should handle openmp trees in llvm-gcc 2. Define a set of openmp builtins, which will need to be supported in llc codegen 3. Something I'm not aware of
If anyone want to work on bringing openmp to llvm-gcc, I'll be happy to help.
If you're using llvm-gcc-4.0 to get llvm bitcode for analysis then front-end is ignoring OpenMP pragmas. if you're trying to use llvm- gcc-4.2 then most likely it won't work, because llvm bitcode emitter has not been updated to handle OpenMP. If you're getting llvm bitcode (without going through llvm-gcc) and somehow annotating it with OpenMP instructions (I don't know how) then also I CBackend, which is used by llc, won't be able to handle it at the moment.
Thanks for the information. The vague idea was to annotate bitcode with OpenMP instructions or maybe modify OpenMP instructions present in the original source. But it seems that this would not be possible at least in a straightforward fashion.
If you're interested, we welcome contribution on OpenMP front.
We'd be happy to contribute, if something eventually comes out.
In the beginning I'd like to say I am not very familiar with (llvm-)gcc internals, so some of my assumptions and guesses may be wrong. Please, correct me, then.
Anton Korobeynikov wrote:
Internally OpenMP lowering in gcc is split into two phases. Funny, but llvm converter is run between these phases It looks like second phase uses some gcc analysis passes (like dominance info, etc).
In order to support OpenMP in llvm-gcc 4.2 we should: 1. Decide at which level we should handle openmp trees in llvm-gcc 2. Define a set of openmp builtins, which will need to be supported in llc codegen 3. Something I'm not aware of
I've found an interesting paper on the design and implementation of OpenMP in GCC: http://people.restadhat.com/dnovillo/Papers/gcc2006.pdf It describes how and when OpenMP pragmas are lowered. It seems the second phase that Anton wrote about is the most important one. I believe it is the "pass_expand_omp" mentioned in the paper. Its aim is to extract parallel sections into new functions, expand directives into built-in function calls, insert code to calculate iteration-space, etc. I guess no GIMPLE codes used by OpenMP are left after this pass. If it is possible to do the conversion to LLVM IR at this stage, the only thing to do is to handle OpenMP built-ins. The simplest route would be to replace them with calls to libgomp (an LGPL library accompanying GCC). It is what GCC seems to do.
The second option is allowing to somehow express OpenMP directives directly in the LLVM (or some general multiprocessing directives that OpenMP ones can be lowered to). Then, a pass could be written that handles them in a similar fashion to "pass_expand_omp" in GCC (see above). Any frontend (i.e. clang) would have simpler job supporting OpenMP then. That said, I admit I have no idea how to express these directives. For example: how to annotate a loop that it should be turned into a parallel one? I believe keeping the IR as simple as possible is an important goal, so we would need some lightweight mechanism, transparent to the existing passes.
I'm personally interested in the subject, because I am trying to develop an automatic loop parallelization pass for LLVM. Initially, I thought I would write anything from scratch. However, if there is a decision to include some support for OpenMP (and similar techniques) lowering in the LLVM core, I'am ready to help.
Since we are already using LLVM for other purposes, we thought about using LLVM for analysis and then OpenMP for compilation. The idea was to use the LLVM back end to spit out C code with OpenMP directives. However, looking at the C code that llc produces, it seems that this might be a non-starter, as loops have already been turned into gotos in the generated C.
I suspect we may need to do something else, but if anyone has any bright ideas on how to use LLVM for this purpose, I'd be very grateful.
I have some ideas, but I'm not sure if they are bright...
As you have noticed, loops aren't represented directly in the LLVM IR. However, there are analysis passes which may be helpful to "reconstruct" them. For example: LoopInfo pass detects natural loops (as sets of basic blocks) and ScalarEvolution pass finds loop induction variables (and also does many other things). Using them and some own solutions it should be possible to detect most loops that are potential candidates for parallelization. Is this what you would like to achieve?
If you know a loop is parallelizable, it is possible, in general, in LLVM to extract it (its blocks) into a new function. In the place of the original loop a code spawning some threads calling this new function can be inserted. Each function call gets different iteration range for the loop. If you're interested I can send you a very simplified but working example that demonstrates it (it's about 500 lines).
Right now, one big missing piece in this puzzle is - dependence analysis.
Right. I was only trying to say that it shouldn't be very difficult to find these for/do loops which are interesting from the parallelization perspective (in general, not all for/do loops can be reconstructed).
As for the dependence analysis, I need this piece for my project, and I am preparing myself to write it. I am currently studying some papers, but haven't yet decided on the method. Maybe some of you have experience in the area and would like to advise me something? In the next few days I am going to take a look at the methods based on chains of recurrences algebra. They seem particularly interesting because chrecs are already implemented in the scalar evolution pass. Given the limited time I have for the project, I don't know how precise (and buggy;)) my implementation will be. However, if it appears to be usable, I'll be happy to contribute.
Actually gcc expands things _and_ makes calls into libgomp so lowering omp first would be the best - otherwise you'll need to handle the language structures.
I've just commited a patch to llvm-gcc 4.2, which moves openmp lowering stuff to be run little bit earlier, so llvm-convert will catch its result. It looks now gcc atomic & sync builtins should be introduced to llvm as a remaining ingredient.
Example program from Diego's paper now compiles to:
@.str = internal constant [10 x i8] c"sum = %d\0A\00" ; <[10 x i8]*> [#uses=1]
I've just commited a patch to llvm-gcc 4.2, which moves openmp lowering stuff to be run little bit earlier, so llvm-convert will catch its result. It looks now gcc atomic & sync builtins should be introduced to llvm as a remaining ingredient.
Example program from Diego's paper now compiles to:
Thanks, Anton! The compiled code seems to be ok. IIRC, there is a project aiming at introducing gcc atomic and sync builtins. It is temporarily suspended, but when it is eventually finished, OpenMP will be a nice thing to test it with:)
As for the dependence analysis, I need this piece for my project, and I am preparing myself to write it.
Great!
I am currently studying some papers, but haven't yet decided on the method. Maybe some of you have experience in the area and would like to advise me something? In the next few days I am going to take a look at the methods based on chains of recurrences algebra. They seem particularly interesting because chrecs are already implemented in the scalar evolution pass.
Yes, I'd strongly suggest implementing one of the approaches based on the chrec analysis we have in the scev pass.
I've just commited a patch to llvm-gcc 4.2, which moves openmp lowering stuff to be run little bit earlier, so llvm-convert will catch its result. It looks now gcc atomic & sync builtins should be introduced to llvm as a remaining ingredient.
Example program from Diego's paper now compiles to:
Very nice Anton!!
-Chris
@.str = internal constant [10 x i8] c"sum = %d\0A\00" ; <[10 x i8]*> [#uses=1]
As for the dependence analysis, I need this piece for my project, and I am preparing myself to write it. I am currently studying some papers, but haven't yet decided on the method. Maybe some of you have experience in the area and would like to advise me something?
I recently read Allen & Kennedy's book Optimizing Compilers for Modern Architectures - A Dependence Based Approah. It is extermely useful if you are interested in analysis and transformations for parallelization (ILP, vectorization, multiprocessing).
> As for the dependence analysis, I need this piece for my project, and I > am preparing myself to write it.
Great!
> I am currently studying some papers, but haven't yet decided on the > method. Maybe some of you have experience in the area and would like to > advise me something? In the next few days I am going to take a look at > the methods based on chains of recurrences algebra. They seem > particularly interesting because chrecs are already implemented in the > scalar evolution pass.
Yes, I'd strongly suggest implementing one of the approaches based on the chrec analysis we have in the scev pass.
Yup, just get the array accesses under scev form, and set a constraint system from these. Once you set up the constraint systems for the dependence distance vectors, you're almost done. You can look at how this is set up in gcc trunk tree-data-ref.c, and search for omega.
Yes, I'd strongly suggest implementing one of the approaches based on the chrec analysis we have in the scev pass.
Yup, just get the array accesses under scev form, and set a constraint system from these. Once you set up the constraint systems for the dependence distance vectors, you're almost done. You can look at how this is set up in gcc trunk tree-data-ref.c, and search for omega.
Regardless of which solver you use, you should keep a clean interface between your constraint representation and the solver. I like Omega a lot and it is a fine choice for the first solver. Although it is very powerful, even it cannot handle symbolic loop strides. Perhaps more importantly for common uses, for many trivial cases (A and A[x+1]), setting up the Omega data structures to invoke the solver can be too heavyweight. It could be worthwhile to add one or more simple solvers (see the Allen and Kennedy textbook for examples) before falling back on Omega.
Yup, just get the array accesses under scev form, and set a constraint system from these. Once you set up the constraint systems for the dependence distance vectors, you're almost done. You can look at how this is set up in gcc trunk tree-data-ref.c, and search for omega.
Regardless of which solver you use, you should keep a clean interface between your constraint representation and the solver. I like Omega a lot and it is a fine choice for the first solver. Although it is very powerful, even it cannot handle symbolic loop strides. Perhaps more importantly for common uses, for many trivial cases (A and A[x+1]), setting up the Omega data structures to invoke the solver can be too heavyweight. It could be worthwhile to add one or more simple solvers (see the Allen and Kennedy textbook for examples) before falling back on Omega.
Sebastian, Vikram, thank you for the advices.
My initial plan was to implement the methods from the Allen and Kennedy book. I'll see if I find time to add the Omega solver as a fallback, too. I am not very familiar with GCC internals, but having taken a look at tree-data-ref.c, it seems it would be the same approach as there.
My initial plan was to implement the methods from the Allen and Kennedy book. I'll see if I find time to add the Omega solver as a fallback, too. I am not very familiar with GCC internals, but having taken a look at tree-data-ref.c, it seems it would be the same approach as there.
That makes sense to me.
If you can add a dependence analysis capability to LLVM, I think it will fill a really important gap because I've seen at least two research projects decide not to use LLVM reportedly for the lack of dependence analysis.