question about "parallel"

General OpenMP discussion

question about "parallel"

Postby fanbin » Wed Jul 16, 2008 6:24 pm

Here is the code:
int main(int argc, char** args)
{
#pragma omp parallel sections
{
#pragma omp section
#pragma omp parallel for
for(int i=0;i<4;i++)
{
printf("first for, i=%d, Threadid: %d\n", i, omp_get_thread_num());
}
#pragma omp section
#pragma omp parallel for
for(int j=0;j<4;j++)
{
printf("second for, j=%d, Threadid, %d\n", j, omp_get_thread_num());
}
#pragma omp section
printf("no-for Threadid: %d\n", omp_get_thread_num());
}
}

What I expect is that all iterations of the first for and second for can execute parallel.
Here is the related environment variables:
export OMP_NUM_THREADS=8
export OMP_DYNAMIC=FALSE
export OMP_NESTED=TRUE

Here is the output of an execution:
no-for Threadid: 2
first for, i=2, Threadid: 2
first for, i=0, Threadid: 0
first for, i=3, Threadid: 3
second for, j=1, Threadid, 1
second for, j=3, Threadid, 3
second for, j=2, Threadid, 2
first for, i=1, Threadid: 1
second for, j=0, Threadid, 0

The eight iterations of the first and second for are interleaving, this means that these iterations do execute parallel. But the Threadid shows that there are only four threads at most, not eight. Is it a kind of contradiction? Any ideas? thanks!
fanbin
 
Posts: 15
Joined: Wed Jul 16, 2008 12:28 am

Re: question about "parallel"

Postby djo35 » Fri Jul 18, 2008 6:55 am

Your for loop only has four iterations so OMP will only split the for between four threads. Try changing

for(int j=0;j<4;j++)

to

for(int j=0;j<8;j++)

Dan
djo35
 
Posts: 11
Joined: Wed May 28, 2008 8:24 am

Re: question about "parallel"

Postby ejd » Mon Jul 21, 2008 7:00 am

The "problem" is that when you use nested parallelism the thread ids start at zero for each parallel region. In your example, if you had printed the thread id that executed each section, you would most likely have seen thread 0 executing section 1, thread 1 executing section 2, and thread 2 executing section 3.

When thread 0 saw the "parallel for" it would create a team of 8 threads (since omp_num_threads was set to 8 and dynamic was false) to execute the for loop. This new team would have threads numbered zero to 7 (the old thread 0, being the new thread 0 or master of this team). Since the for loop only had 4 iterations, this was a larger team than was needed. The default schedule was static and each of the first 4 threads executed one iteration of the loop.

When thread 1 saw the "parallel for" in the second section, it would create a team of 8 threads to execute the for loop. The new team would have threads numbered zero to 7 (the old thread 1, being the new thread 0 or master of this team). As above, since the for loop only had 4 iterations, this was a larger team than was needed. The default schedule was static and each of the first 4 threads executed one iteration of the loop.

Here is a modified version of your code to try and show the thread numbering.
Code: Select all
#include <omp.h>
#include <stdio.h>

int main(int argc, char** args)
{
  int thrd;

  omp_set_dynamic(0);
  omp_set_num_threads(8);
  omp_set_nested(1);

  #pragma omp parallel sections private(thrd)
  {
    #pragma omp section
    {
        thrd = omp_get_thread_num();
      #pragma omp parallel for num_threads(4)
      for(int i=0;i<4;i++)
      {
        printf("first for, i=%d, anc: %i  Threadid: %d\n", i, thrd, omp_get_thread_num());
      }
    }

    #pragma omp section
    {
        thrd = omp_get_thread_num();
      #pragma omp parallel for num_threads(4)
      for(int j=0;j<4;j++)
      {
        printf("second for, j=%d, anc: %i  Threadid, %d\n", j, thrd, omp_get_thread_num());
      }
    }

    #pragma omp section
      printf("no-for Threadid: %d\n", omp_get_thread_num());
  }
  return 0;
}


And here is the output. You will see that the "ancestor" for the "first for" is thread 0 and the ancestor for the "second for" is thread 1.
Code: Select all
no-for Threadid: 2
first for, i=1, anc: 0  Threadid: 1
second for, j=0, anc: 1  Threadid, 0
first for, i=0, anc: 0  Threadid: 0
first for, i=2, anc: 0  Threadid: 2
second for, j=3, anc: 1  Threadid, 3
second for, j=2, anc: 1  Threadid, 2
first for, i=3, anc: 0  Threadid: 3
second for, j=1, anc: 1  Threadid, 1


In OpenMP V3.0, a new routine has been added called omp_get_ancestor_thread_num to help people understand how threads are being used in their programs.

As a side note, you should try to use only the number of threads that you need. This will reduce the overhead of creating threads and waiting on threads that are not necessary. In the example code above, you will see that I added a num_threads clause to the inner parallel regions to reduce the number of threads created by the inner teams.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am


Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot] and 3 guests