STL parallel for -- the single nowait trick

General OpenMP discussion

STL parallel for -- the single nowait trick

Postby doob » Tue Jan 22, 2008 3:03 pm

Hi,

I'm new to OpenMP. One thing I instantly run into is the problem how to parallize iterations over STL containers like this:
Code: Select all
for(it= list1.begin(); it!= list1.end(); it++)
{
   it->compute();
}

Among others I frequently find the following code snipped as one of the proposed solutions to this. However I can't understand why this is supposed to be executed in parallel.
Code: Select all
#pragma omp parallel private (it)
{
   for(it= list1.begin(); it!= list1.end(); it++)
   {
      #pragma omp single nowait
      {
         it->compute();
      }
   } // end for
} // end ompparallel

Everything I read so far about OMP lets me think that the previous code states the following:
  • Every thread gets its own copy of the for loop.
  • The iterator is privat to each thread. I.e. each threads runs over the complete list.
  • Only one thread a time can access the section where the iterator is accessed, but the nowait says all other threads continue to loop over their own copy of the list iterator.
So that means to me that there is nothing of the real work done in parallel, but the rest of the code loops n*num_threads over the list. Am I completely wrong? Did I miss something important? Please help me to understand that.

Thanks in advance and kind regards,

peter
doob
 
Posts: 3
Joined: Tue Jan 22, 2008 2:37 pm

Re: STL parallel for -- the single nowait trick

Postby ejd » Tue Jan 22, 2008 4:07 pm

I am not a C++ expert, but I will try answering your question. The answer is that you are correct:
  • Every thread gets its own copy of the for loop.
  • The iterator is private to each thread.
  • Only one thread at a time can access the section where the iterator is accessed, but the nowait says all other threads continue to loop over their own copy of the list iterator.
However, real work can be done in parallel, depending on how much work there is in the computation within the single. While the one thread is busy, the other threads continue (nowait) to the next iteration and a new thread enters the single with a new "it" value. This of course is not as efficient as other methods.

For example, if you look at the STL vector method, you could use std:: vector.size to write the loop in the canonical form, rather than use iterators:

Code: Select all
#pragma omp parallel for
for(int i = 0; i < xVect.size(); ++i)
  process(xVect[i]);

This would have a lot lower overhead and is recommended for things like arrays and vectors (and other containers where you have a size).

It is also worth noting that in the new OpenMP spec V3.0, random access iterators will be added to the canonical form.
Last edited by ejd on Wed Jan 23, 2008 6:30 am, edited 1 time in total.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: STL parallel for -- the single nowait trick

Postby doob » Wed Jan 23, 2008 4:35 am

Hmm, I'm afraid I still don't got it. Does this mean that the single section depends up on which instance of *it compute() is run?

I thought it is single to one thread among the whole team. So even if compute has a lot of work to do, how can another thread work on another iteration of compute in parallel while this part of the code is within a single section? As far as I understand this, nowait lets the rest of the team continue to iterate but doesn't that mean also that all but one threads are jumping over the single section?

Sorry, if I'm a blockhead. Thanks for the proposed solution. I just try to understand how it is supposed to work with the single construct.
doob
 
Posts: 3
Joined: Tue Jan 22, 2008 2:37 pm

Re: STL parallel for -- the single nowait trick

Postby ejd » Wed Jan 23, 2008 6:47 am

You are correct - single is a worksharing construct and allows one thread of a team to execute the associated structured block. So each thread executes the for loop. On encountering the single, one thread will execute it. Because of the nowait, instead of waiting for the single to complete, the rest of the threads of the team will continue to the next iteration of the loop. When they encounter the single again, one thread will again execute it, but with the new value of "it" and the rest of the threads will continue. When the first thread finishes the single, it will continue with the for loop. When it encounters the single, it is just catching up to the rest of the team and ignores it, because another thread in the team is executing (or has executed) it.

Think of it as if the loop were unrolled:

Code: Select all
#pragma omp parallel private (it)
{
  it = listl.begin
  #pragma omp single nowait
  {
    it->compute();
  }
  it++;
  #pragma omp single nowait
  {
    it->compute();
  }
  it++;
...
}
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: STL parallel for -- the single nowait trick

Postby doob » Wed Jan 23, 2008 6:57 am

Thanks!
doob
 
Posts: 3
Joined: Tue Jan 22, 2008 2:37 pm

Re: STL parallel for -- the single nowait trick

Postby lfm » Wed Jan 23, 2008 1:08 pm

In addition to the other comments, OpenMP 3.0 will also have tasks, which would work nicely in a case where you had a true linked list, for example:
Code: Select all
#pragma omp parallel
#pragma omp single
{
   for (node *p = first; p != 0; p = p->next)
#pragma omp task
      process(p);
}


In this case the tasks are spawned off by one thread, and executed at some time before or during the barrier at the end of the single construct by the other threads in the team.

For a random-access iterator, the original formulation is probably more efficient if the time for each computation is roughly equal.

-- Larry
lfm
 
Posts: 135
Joined: Sun Oct 21, 2007 4:58 pm
Location: OpenMP ARB


Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 1 guest

cron