Consideration about parallelization

Use this forum to discuss the book: Using OpenMP - Portable Shared Memory Parallel Programming, by Barbara Chapman, Gabriele Jost and Ruud van der Pas Read the viewtopic.php?f=8&t=465 for book info and to download the examples. Post your feedback about the book and examples to this forum

Consideration about parallelization

Postby nicola.montini » Fri Aug 26, 2011 6:48 am

Hi, I have some questions about the book. First of all I want to remark some parts of the book:
1- PAGE 244: A data race condition exists when two threads may concurrently access the same shared variable between synchronization points, without holding any common locks and with at least one thread modifying the variable. The order of these accesses is nondeterministic. The thread reading the value might get the old value or the updated one, or some other erroneous value if the update requires more than one store operation. This usually leads to indeterministic behavior, with the program producing different results from run to run.
2- PAGE 271: "VERIFICATION OF THE SEQUENTIAL VERSION"[...] Run the loops parallelized with OpenMP backwards. If the result is wrong, the loop(s) cannot be executed in parallel. The reverse is not true. If the result is okay, it does not automatically mean the loop can be parallelized.
3-PAGE 272: "VERIFICATION OF THE PARALLEL CODE": [...]It is also good practice to find the lowest compiler optimization level for which the bug occurs. [...]Several scenarios are now worth exploring:
• Run the OpenMP version of the program on one thread. If the error shows up then, there is most likely a basic error in the code.
• Selectively enable/disable OpenMP directives to zoom in on the part of the program where the error originates.
• Check that the libraries used are thread-safe in case one or more of their functions are called within a parallel region.

Now with this considerations I want to ask you some doubts about a for loop that I'm trying to parallelize:
1- In my for loop there are other two for loops in which essentially the program accesses to some arrays to read data and writes on two arrays. I listed all the variables in the for loop, and these are the results:
    - the only variables modified by the loop are two: the first one, say it "mpts_sfrt_a" is an array used for keeping some support data that are read from another array and written on "mpts_sfrt_a". "mpts_sfrt_a" is initialized outside the parallel region, so I declared it firstprivate, because otherwise it would be undefined. I declared it firstprivate to avoid a possible data race condition, because if it was shared, multiple threads would write to same memory locations of this array.
    - The second variable modified inside the loop is a matrix to be filled and used for subsequent computations outside the parallel region. This matrix is implemented row wise as an array and I'm pretty sure that each thread writes on different memory locations. The matrix is accessed via the indexes of the for loops. The two parallel threads I use get the two chunks of the row indexes of the matrix (which are the indexes of the most external loop, the parallel one): the first one gets the first chunk and the second one gets the second chunk, so every thread accesses to a distinct row of the matrix.
    - The other variables are all used in read mode (i.e. they are only accessed for reading), or they are local variables created inside the parallel region. Can in this situation occur a data race condition? The book says that data race conditions can happen only if the threads modify a variable; therefore if two thread read the same memory location concurrently there shouldn't be any problem, right??
2- I ran the loop backwards in sequential mode and it works fine (I'm sure about that). When I start with two threads it starts to fill the matrix with some wrong values in a nondeterministic manner
3- I'm using no optimization to try and find the bug.

Am I saying something wrong? Because based on what you say in your book the loop should work, but it doesn't

Here is the code of the loop:
Code: Select all
#pragma omp parallel num_threads(2) default(shared) 
   {
#pragma omp for firstprivate(mpts_sfrt_a)
      for (int i = 0; i < (int)cxs_number; i++) {
         {
            for (int j = 0; j < (int)cys_number; j++) {
               //cout << "i = " << i << " j = " << j << endl;
               int p = 0;
               
               //#pragma omp for
//#pragma omp critical (write_)
               //{
                  //cout << "i = " << i << " j = " << j << endl;
                  double cx = cxs_a[i] - 1; //
                  double cy = cys_a[j] - 1;
                  for (int k = 0; k < (int)cols_number; k++) { // problem is probably here but not sure
                     mpts_sfrt_a[k] = mpts_sfr_a[k] + cx;
                     mpts_sfrt_a[k + cols_number] = mpts_sfr_a[k + cols_number] + cy;
                  }
               //}
               double corr_value = 0;
               int valid_index = 0;
               
               for (int l = 0; l < (int)cols_number; l++) {
                  int x = (mpts_sfrt_a[l]);
                  int y = (mpts_sfrt_a[l + cols_number]);
                  bool valid = (x >= 1) && (x <= gradmod->cols) && (y >= 1) && (y <= gradmod->rows);
                  if (!valid) continue;
                  int xyind = (y - 1) * gradmod->cols + (x - 1);
                  double gradmod_value = (double) gradmod_a[xyind];
                  double arg0 = graddir_a[xyind] - dirsmc_a[l];//graddir_ptr[num1 - 1] - dirsmc_ptr[l];
                  corr_value += gradmod_value * fabs(cos(arg0));
                  if (gradmod_value > 0) {
                     valid_index++;
                  }
               }
               double val = corr_value * valid_index / ( cols_number * cols_number);
               corr[i * (int)cys_number + j] = corr_value * valid_index / ( cols_number * cols_number);
            }
            
         }
      }
nicola.montini
 
Posts: 14
Joined: Wed Aug 17, 2011 10:32 am

Re: Consideration about parallelization

Postby nicola.montini » Fri Aug 26, 2011 6:51 am

I forgot one thing: what do you mean with thread-safe libraries?
nicola.montini
 
Posts: 14
Joined: Wed Aug 17, 2011 10:32 am

Re: Consideration about parallelization

Postby anv » Thu Sep 22, 2011 3:26 am

1. The library is thread-safe if it is safe to call it simultaneously from different threads, and the results are always correct. Calling thread-safe library does not need any explicit synchronization.

2. Your mistake in the code is most likely that you think mpts_sfrt_a is an array, but it is a pointer. So the firstprivate directive copies the pointer in each thread, but this different private pointers are referencing the same shared memory. To check this you can print the value of your private pointers inside the parallel loop. It is very easy to mix the arrays and pointers in C/C++.
anv
 
Posts: 31
Joined: Wed Dec 12, 2007 9:36 am

Re: Consideration about parallelization

Postby ruud » Tue Nov 01, 2011 2:38 am

Hi Nicola,

I'm sorry for the long delay responding to your post.

It was already pointed out that using "firstprivate" on the pointer does not give each thread access to private storage. For that you could (for example) allocate a memory buffer inside the parallel region and have each thread access the pointer to it. That'll give you private storage. Something like this:

Code: Select all
#pragma omp parallel
{

   double *p = malloc(....);

   <use p to access the memory>

} // End of parallel region


I must admit I didn't study your code in detail, but at first sight the rest seems to be correct from a parallel point of view. Could you perhaps try to address the use of firstprivate and let us know whether that solves the problem?

Getting back to your questions, you're right. Read only shared data can not induce a race condition. I'm still somewhat confused by your comment "Because based on what you say in your book the loop should work, but it doesn't" though.

As we wrote in the book, if you run a loop backwards and get the right result, it doesn't mean the loop can be parallelized. The reverse is true. In case you get the wrong result you know you can't parallelize the loop. A data race is a case in point. If you run the loop backwards in sequential mode you won't see a data race.

The suggestion to not use optimization was just to make your life easier :) Not only is debugging non-optimized code usually easier, compiler optimizations can make a data race seemingly disappear. I have an example of that.

Actually, if the above suggestion does not fix the problem, I would strongly suggest to use a data race detection tool. There are several available and it is astonishing to see how often they find a data race the developer was not aware of.

Kind regards, Ruud
ruud
 
Posts: 23
Joined: Mon Nov 26, 2007 2:13 am


Return to Using OpenMP - The Book and Examples

Who is online

Users browsing this forum: No registered users and 3 guests