OpenMP Nested Loops Collapse Directive

General OpenMP discussion

OpenMP Nested Loops Collapse Directive

Postby Horatio Alger » Sat Nov 16, 2013 10:41 am

Hy there,
I apologize in advance for the silly question but I would like to know how does actually work the Collapse directive applied to a nest of do loops.

For just to make clearer what I am looking for, what I would like to understand is how the way OpenMP collapses nested loops relates to, for instance, Lamport hyperplanes, see Lamport (1974), or other more recent algorithmic solutions to parallelization of nested loops, see Lotfi et al (2009).
In a nutshell, why algorithmics students keep doing research on (perfectly) nested loops parallelization if OpenMP does provide a compiler optimization approach to nested loop parallelization ?
But after saying that, how does it work "inside" OpenMP collapse of nested loops ? how does it make the actual collapse of nested loops ?
Are the two approaches to nested loops optimization really unrelated one another ? or OpenMP simply entails the previous algorithmic modelling inside a compiler directive, e.g. collapse ?

Thanks in advance for your reply. :?

References:
Lamport, Leslie. "The parallel execution of DO loops." Communications of the ACM 17.2 (1974): 83-93.
Lotfi, Shahriar, and Saeed Parsa. "Parallel loop generation and scheduling." The Journal of Supercomputing 50.3 (2009): 289-306.
Horatio Alger
 
Posts: 3
Joined: Sat Nov 16, 2013 10:00 am

Re: OpenMP Nested Loops Collapse Directive

Postby MarkB » Sat Nov 16, 2013 12:17 pm

The OpenMP collapse clause only treats the very simple (but common) case of perfectly nested rectangular loops with no dependencies.
It transforms the loop nest into a single loop respecting the sequential order of iterations and then parallelises that.

The papers you quote treat more difficult cases such as non-rectangular nests, or nests with loop carried dependencies.

Hope that helps,
Mark.
MarkB
 
Posts: 428
Joined: Thu Jan 08, 2009 10:12 am

Re: OpenMP Nested Loops Collapse Directive

Postby Horatio Alger » Sun Nov 17, 2013 2:16 am

Thanks MarkB for your prompt reply.

It is my fault I did not manage to make clear my query. :oops:
I do know that collapse Directive in OpenMP applies only to perfecly nested rectangular loops.
I do know that the other literature mentioned applies also to more difficult cases.

My query concerns, instead, how does OpenMP actually perform this collapse. :?
What does happen inside OpenMP ?
how the serial execution of nested loops is transformed into a parallel execution of the same operations ? :?:

From an algorithmic point of view, both Lamport (1974) or Lofti et al (2009) do explain what they are doing.
Instead, I do not manage to find a reference about OpenMP collapse directive which explains how does happen the loop transformation. :cry:

Thank you very much in advance for your help in these matters, :(
Horatio Alger
 
Posts: 3
Joined: Sat Nov 16, 2013 10:00 am

Re: OpenMP Nested Loops Collapse Directive

Postby MarkB » Sun Nov 17, 2013 5:22 am

I'm sorry if I misunderstood your question! The OpenMP specification says that the iterations of all the loops are collapsed into one large loop where the sequential execution of the iterations in the original loops determines the order of the iterations in the collapsed loop. The iterations of the collapsed loop are then assigned to threads according the specified schedule clause, if present, or else the (implementation defined) default schedule.

Exactly how this is done is up to the implementation, but the concept is fairly straightforward, and there isn't anything very complicated to do. So for example

Code: Select all
#pragma omp for schedule(static,1) collapse(2)
for (i=0; i=n; i++){
   for (j=0;j<m;j++){
....


is transformed to

Code: Select all
#pragma omp for schedule(static,1)
for (ij = 0; ij < n*m; ij++){
   i= ij/m;
   j= mod(ij,m);
....


and then the implementation's mechanism for distributing a single loop across threads is applied.

The general case has to treat an arbitrary number of collapsed loops, with different start bounds and strides, but I believe there is always a straightforward closed expression to recover the original indices.

Is that a better answer? :D
MarkB
 
Posts: 428
Joined: Thu Jan 08, 2009 10:12 am

Re: OpenMP Nested Loops Collapse Directive

Postby Horatio Alger » Sun Nov 17, 2013 10:21 am

Thanks to MarkB for the prompt and smart answer.

The solution you propose for n=2 nested loops is great :idea: . Although that is true, it should be slightly modified in order to work properly, i.e. index "i" should be dealt with a bit differently lest it would produce fractionary figures, see the attached spreadsheet. Assuming I have implemented it correctly.

After saying that, I have the gut feeling -- only guts remain to me to tackle these problems -- that the case of n=2 nested loops is one on its own. Again, consider the Mickey Mouse example in the spreasheet for n=3. It is easy to spot a kind of cyclical behavior of indexes triples which I find personally difficult to express in closed forms. The alternative would be using some conditional branching but I believe it becomes cumbersome for a growing n, a sort of curse of dimensionality.

Anyway, the questions remain:

1) how does OpenMP directive collapse operate a nested loops transformation ? :cry:
I guess that developers of OpenMP Parallel programming model do know what they have entailed into implementation 3 and above. I would be very grateful to them if they could tell me what it is actually doing the collapse directive so than I can set order in my mind between these two strands of literature which strive towards optimizing nested loops, i.e. the algorithmic and the compilers optimization ones. :?

2) The example by MarkB is very interesting, but even more interesting would be to read some references about the general closed forms to generate n-uples of indexes in the sort of one collapsed loop transformation MarkB suggests for the case n=2.
Do they exist and they have been published or are they a well kept secret ? :cry: :(
Attachments
OpenMP Forum Collapsing Nested Loops case n=2.zip
Mickey Mouse Example which implements MarkB algorithm for the case n=2 nested loops, slightly modified by me. It's an Excel file.
(2.76 KiB) Downloaded 170 times
Horatio Alger
 
Posts: 3
Joined: Sat Nov 16, 2013 10:00 am

Re: OpenMP Nested Loops Collapse Directive

Postby MarkB » Sun Nov 17, 2013 9:38 pm

Horatio Alger wrote:The solution you propose for n=2 nested loops is great :idea: . Although that is true, it should be slightly modified in order to work properly, i.e. index "i" should be dealt with a bit differently lest it would produce fractionary figures, see the attached spreadsheet. Assuming I have implemented it correctly.


Integer division in C truncates towards zero. In Excel, you can use the QUOTIENT function.

The general case is

Code: Select all
for (i1 = 0; i1< n1; i1++){
    for (i2 = 0; i2< n2; i2++){
        for (i3 < 0; i3< n3; i3++){
               ....
            for (ik < 0; ik< nk; ik++){
 


becomes

Code: Select all
for (i=0; i<n1*n2*n3*...*nk; i++){
          i1 = mod(i/n2*n3*...*nk, n1)
          i2 = mod(i/n3*n4*...*nk, n2)
          i3 = mod(i/n4*n5*...*nk, n3)
          .....
          ik = mod(i, nk)
MarkB
 
Posts: 428
Joined: Thu Jan 08, 2009 10:12 am


Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 4 guests

cron