critical sections embedded within parallel for

General OpenMP discussion

critical sections embedded within parallel for

Postby armstrhu » Mon Apr 14, 2014 2:05 pm

Hello forum,

I have a question about embedding a critical functions which can be executed in any order, within a parallel for loop. I have something similar to the following,

Code: Select all
voide dostuff()
{
   dostuffA();
   dostuffB();
   dostuffC();
   dostuff D();
}


However, each of the dostuff*() modify different variables, and so much be defined critical

Code: Select all
voide dostuff()
{
   #pragma omp critical (A)
   {
      dostuffA();
   }
   #pragma omp critical (B)
   {
      dostuffB();
   }
   #pragma omp critical (C)
   {
      dostuffC();
   }
   #pragma omp critical (D)
   {
      dostuff D();
   }
}


But the order which they are executed doesn't matter, so I would like to surround them with a sections directive, so that a thread that calls dostuff() doesn't have to wait around for another thread to finish its critical section. I would like the thread to call dostuff() and work on a section which is available...

Code: Select all
voide dostuff()
{
#pragma omp sections
{
   #pragma omp section
   {
      #pragma omp critical (A)
      {
         dostuffA();
      }
   }
   #pragma omp section
   {
      #pragma omp critical (B)
      {
         dostuffB();
      }
   }
   #pragma omp section
   {
      #pragma omp critical (C)
      {
         dostuffC();
      }
   }
   #pragma omp section
   {
      #pragma omp critical (D)
      {
         dostuff D();
      }
   }
}


But now, the main call to dostuff() is embedded within a parallel for

Code: Select all
#pragma omp parallel for
{
   // ...
   dostuff();
}


I would like it to behave like multiple threads can call dostuff(), but once called, work the sections in an arbitrary order based upon what other threads are working on. The above code doesn't seem to be working. What is seems like is a thread will call dostuff(), do one section, then exit the function dostuff() (maybe expecting a difference thread to do the remainder of the sections?). Ultimately, what I need is that each thread in the parallel for will call dostuf(), and each thread will do all sections. However, the order the thread executes the section will be based upon what other threads are currently working on. Any help on this would be greatly appreciated. Thanks
armstrhu
 
Posts: 9
Joined: Mon Apr 14, 2014 10:56 am

Re: critical sections embedded within parallel for

Postby armstrhu » Mon Apr 14, 2014 2:26 pm

So I seemed to have fixed it by defining a new parallel section at the beginning of dostuff() with num_threads(1). The thought was to force the calling thread to parallelize into 1 thread and do all the sections...

Code: Select all
void dostuff()
{
#pragma omp parallel num_threads(1)
{
    #pragma omp sections
    {
       #pragma omp section
       {
          #pragma omp critical (A)
          {
             dostuffA();
          }
       }
       #pragma omp section
       {
          #pragma omp critical (B)
          {
             dostuffB();
          }
       }
       #pragma omp section
       {
          #pragma omp critical (C)
          {
             dostuffC();
          }
       }
       #pragma omp section
       {
          #pragma omp critical (D)
          {
             dostuff D();
          }
       }
    }
}
}


That seems to work, but is a bit strange. Any help on a better way to implement this would be appreciated. Thanks
armstrhu
 
Posts: 9
Joined: Mon Apr 14, 2014 10:56 am

Re: critical sections embedded within parallel for

Postby ftinetti » Tue Apr 15, 2014 1:37 am

Hi,

At least the first code violates (Spec. 4.0, page 53):
The following restrictions apply to worksharing constructs:
• Each worksharing region must be encountered by all threads in a team or by none at
all, unless cancellation has been requested for the innermost enclosing parallel
region.

since both the loop construct and the sections construct are defined as worksharing constructs.

From your description:
Ultimately, what I need is that each thread in the parallel for will call dostuf(), and each thread will do all sections. However, the order the thread executes the section will be based upon what other threads are currently working on. Any help on this would be greatly appreciated.

I think no more than 4 threads will be running at any time, right? If that's the case, then I would define just a parallel sections with those 4 dostuffX(), for X = A, B, C, and D.

HTH,

Fernando.
ftinetti
 
Posts: 582
Joined: Wed Feb 10, 2010 2:44 pm

Re: critical sections embedded within parallel for

Postby MarkB » Tue Apr 15, 2014 6:36 am

I don't think the parallel sections version is doing what you want: the inner parallel region only has one thread, and the sections directive binds to it and not the outer parallel region, so the effect is the same as just using criticals (except with added overhead).

You could achieve what you want by using lock routines instead of critical constructs, since the omp_test_lock() routine will return false if it failed to acquire the lock, though you need to keep track of which functions have actually been called. It's a bit messy, so you should convince yourself carefully that waiting to enter critical regions is really a significant bottleneck first.

Here's a sketch of some code for 2 functions:

Code: Select all
void dostuff()
{

   int Atodo, Btodo;

   Atodo = Btodo = 1;

   while ( Atodo || Btodo ) {

    if (Atodo) { 
        if (omp_test_lock(&lockA) ){
           dostuffA();
           omp_unset_lock(&lockA);
           Atodo = 0;
        }
    }

    if (Btodo) {
         if (omp_test_lock(&lockB) ){
           dostuffB();
           omp_unset_lock(&lock B);
           Btodo = 0;
        }
    }

   } //end while


}


Note that the lock variables lockA and lockB must be shared between threads, and initialised before the parallel for region.
MarkB
 
Posts: 487
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: critical sections embedded within parallel for

Postby armstrhu » Tue Apr 15, 2014 10:09 am

Hey guys, thanks for the replys. You've given me some good stuff to take a look at. So, I'll try to make what I am doing a bit more concrete. I am reading a bunch of datafiles and for each file adding the data to common variables. In theory, there could be 1 datafile or hundreds. Since each datafile is independent, I read them in with a parallel for loop so that one thread handles one datafile. After reading it in, I want to update the common variables with the new data. These would be updating the mean, max, min values, etc... currently, I have 6 statistics that I am calculating, and that can and probably will grow in the future. So if I have multiple threads which have completed reading their datafile, if thread 0 is updating the mean, I don't want thread 1 waiting around when it could be updating the max value, and thread 2 could be updating the min value,etc...below is a more clear version of what I am trying to do...

Code: Select all
#pragma omp parallel for
for (int i=0; i<num_datafiles; ++i)
{
   // read in datafile

   updatedata();
}

void updatedata()
{
    #pragma omp sections
    {
       #pragma omp section
       {
          #pragma omp critical (mean)
          {
             updatemean();
          }
       }
       #pragma omp section
       {
          #pragma omp critical (max)
          {
             updatemax();
          }
       }
       #pragma omp section
       {
          #pragma omp critical (min)
          {
             updatemin();
          }
       }
    }
}


any other pointers would be greatly appreciated. But again, the above code doesn't work. When a thread calls updatedata() it seems to only do one of the sections then exit. When I add the
Code: Select all
#omp parallel num_threads(1)

to the updatedata() function, it SEEMS to force the calling thread to execute all sections. I'll check out using lock routines and see if I can get that to do what I need. Thanks
armstrhu
 
Posts: 9
Joined: Mon Apr 14, 2014 10:56 am

Re: critical sections embedded within parallel for

Postby MarkB » Tue Apr 15, 2014 2:19 pm

armstrhu wrote:After reading it in, I want to update the common variables with the new data. These would be updating the mean, max, min values, etc... currently, I have 6 statistics that I am calculating, and that can and probably will grow in the future.


For mean, max and min it would be much more efficient for each thread to compute the sum, count, max and min of the values it reads, and then to combine the values from each thread together at the end. OpenMP has a feature for doing this for you, by listing the variables in reduction clauses on the parallel loop. Depending on what your other stats are, you can probably treat them in the same way.

As Fernando pointed out, the code with the sections construct is not legal OpenMP unless you add the
Code: Select all
#omp parallel num_threads(1)

so anything might happen!

With the inner parallel region added, the sections should indeed execute on after another - but you are then using nested parallelism which is almost certainly not what you intend!
MarkB
 
Posts: 487
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: critical sections embedded within parallel for

Postby armstrhu » Tue Apr 15, 2014 2:56 pm

Mark,
Thank for the reply. You are right nested parallelism definitely isn't my goal. So the situation is a little more complicated than what I posted, each datafile has data for ~18 variables (position data for ~18 components). And so I am calculating 6 statistics for each of 18 components over all data files. Also, some data files can be small (~kB) and some large (~MBs). If I were to read all the data first and aggregate them then calculate a final statistic on the aggregate, then a thread which reads a small datafile will be stuck waiting for the thread reading the largest datafile.

It seems like you are saying adding the inner #omp parallel num_threads(1) will get each thread to do each section, but maybe is kinda a hack. You said it would do "one section after another"...did this mean that it will do each section in order (not what I want/need), or that each section will be done but order can vary? I tried printing out the thread number and a flag to see what section each thread is working on to figure out the order, but I guess because of the nested parallelism, the thread_id is always given as 0. With the nested parallelism, does the critical construct only work on the current nested level (therefore, it would only block multiple threads if I did a num_threads(n) rather than num_threads(1) on the inner parallel region), or does it work globally? Finally, I know this may be hard to answer, but if this method I have does work, how much overhead does the nested parallelism have? Do you think the lock method would provide a significant reduction in overhead such that it would be worth porting over to the harder to read locking version? Thanks again

edit: I'll take a look at the reduction functionality and see if that will do that I need. Another statistic I am calculating is the variance, and updating the variance is a little more tricky than mean, max, min, since each datapoint added modifies the mean. I am storing the mean and variance explicitly and updating those rather than sum(x) and sum(x^2) because, although I don't think I'll hit it, I didn't want to have to worry about overflowing with the sum(x^2) term...maybe I'll take a look at changing that.
armstrhu
 
Posts: 9
Joined: Mon Apr 14, 2014 10:56 am

Re: critical sections embedded within parallel for

Postby MarkB » Tue Apr 15, 2014 3:37 pm

armstrhu wrote:Also, some data files can be small (~kB) and some large (~MBs). If I were to read all the data first and aggregate them then calculate a final statistic on the aggregate, then a thread which reads a small datafile will be stuck waiting for the thread reading the largest datafile.


That's going to be a problem in your current version anyway. The best you can do without splitting the files is to use schedule(dynamic,1) on the parallel for loop to try and balance the load as best as possible.

Critical are global. Adding the nested parallelism and the sections doesn't change anything: each iteration of the parallel loop will still execute the critical regions in the order they appear in the code, though criticals from different iterations can be interleaved in time. Within a call to updatedata(), the sections construct does nothing because there is only one thread in the relevant (inner) team - it does not schedule different sections to different threads in the outer team. But most likely the extra overhead is not much compare to the overheads of the criticals.

If you are doing a one-pass mean and variance calculation, then you can combine partial results from different threads using the expression here:
http://en.wikipedia.org/wiki/Algorithms ... _algorithm

However, all this may well be fairly useless: the performance will likely be completely determined by the I/O bandwidth, which on most systems will not improve by using multiple threads at a time.
MarkB
 
Posts: 487
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: critical sections embedded within parallel for

Postby armstrhu » Tue Apr 15, 2014 3:47 pm

Mark,
Thanks a million, your explanations are very clear. I can see what I need to do now, just a matter of deciding which path I want to take....
armstrhu
 
Posts: 9
Joined: Mon Apr 14, 2014 10:56 am

Re: critical sections embedded within parallel for

Postby MarkB » Tue Apr 15, 2014 3:50 pm

armstrhu wrote:Thanks a million


You're very welcome!
MarkB
 
Posts: 487
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Next

Return to Using OpenMP

Who is online

Users browsing this forum: Yahoo [Bot] and 16 guests