I’d like to add my 2 cents to the recent online discussion between Ruud van der Pas (Sun Microsystems) and Prof Charles Leiserson (CILK ARTS) in
1. http://www.cilk.com/multicore-blog/bid/6518/The-OpenMP-Concurrency-Platform
2. http://blogs.sun.com/ruud/entry/demystifying_persistent_openmp_myths_part
and
3. http://www.cilk.com/multicore-blog/bid/8752/Debunking-an-OpenMP-Demystifier
which to my regret got a bit too emotional.
Hopefully, I am able to clarify.
I am myself not trying to sell anything. I am on the user side leading a small team of people trying to help engineers and scientists of our university in getting their applications running in parallel.
My motivation in participating in the OpenMP Language Committee is to contribute the user’s perspective into the definition process of OpenMP.
As OpenMP is defined by currently 16 members of the OpenMP ARB, compromises have to be made, and of course I have my own points of criticism (this is where I add “unfortunately” in my explanations below.).
In (1) Charles Leiserson gave a short introduction (including some criticism) into OpenMP and of course such an introduction included simplifications. Anyhow, it was a pity that this introduction from September last year did not relate to the latest OpenMP specification which was released in May 2008. With this version 3.0 OpenMP supports tasking as a major enhancement and it also improves the support of nested parallelization.
I am relating to the last reply from Charles Leiserson (3) in my numbering.
1. It is true, that "OpenMP does not attempt to determine whether there are dependencies between loop iterations." But there are tools available to detect data races, like the one which are SILK ARTS is promoting for their product Cilk++. Also several OpenMP compilers print our warnings, if they are able to detect problems at compile time. As it is quite easy to run into data races when parallelizing real codes, I always recommend use a data race detection tool before putting an OpenMP code into production. This seems to be true for Cilk++ codes likewise.
2. If the interpretation of OpenMP directives is turned of at compile time, these directives do no longer affect the serial correctness of the code. But if OpenMP support is activated at compile time, these directives may have an impact on the behavior of the code, even if the parallel region is deactivated (by an if clause evaluating to false) or if the parallel region is executed with one thread only.
3. The overhead of a static distribution of iterations of a parallelized loop is really cheap, because each thread is able to figure out its chunk independently.
The example which is considered in (3) contains a combined parallel region and for worksharing construct with a reduction clause. The overhead of a parallel region is by no means neglectable, nor is a for worksharing construct even with a static schedule, because it implicitly contains a barrier at the end. Also the reduction in the example adds some overhead.
The for worksharing construct itself, without any barrier (after adding a "nowait" clause) and with a static schedule should come almost for free. But be aware! Unfortunately, the default loop schedule is implementation dependent. So add a schedule(static) clause to be on the safe side.
4. – 8. Nested parallelization was specified from the very beginning in 1997, but there clearly were gaps in the specification which have been filled with OpenMP 3.0. In the beginning it was not completely clear how to control the number of threads of inner parallel regions and in fact, compilers implemented nesting quite differently. Also there was no standard way to limit the nesting level or the total number of threads, which is now clearly stated in OpenMP 3.0. As a consequence, it was quite easy to blow up a machine. Still, unfortunately, the default number of threads per parallel region and the default pool size is implementation dependent. Therefore the user still has to pay attention in order not to exceed any resource limits.
I like to add, that despite any criticism, OpenMP is in productive use since about 10 years now. Plenty of publications about successful OpenMP parallelization projects have been presented in the OpenMP workshops and elsewhere since then. As the most prominent example I like to mention the Gaussian chemistry package, which is the most frequently used ISV code at our site.
We also were able to successfully employ nested parallelization for large application codes. Even the early implementations of nested parallelization with OpenMP were not useless. But some care had to be taken.
I do not think that the capabilities of the early versions of OpenMP were as limited as the summary of the short introduction by Charles Leiserson (3) suggested.
Anyhow, I hope that both approaches, OpenMP and Cilk++ will contribute to move industry and science forward to the benefit of mankind.
