Reduction for arrays

General OpenMP discussion

Reduction for arrays

Postby Markus » Fri Jun 20, 2008 6:56 am

Hello!

I have just started to learn OpenMP by rewriting some finite element program in C. So far it works quite well except of one case where I am stuck.
Adding something to a scalar value in a loop can be done by

//--------------------------------------------------------
int i;
double sum, nn=1000;
double *f=new double[nn];

//write some values to f

#pragma omp parallel for shared(f) private(i) reduction(+:sum)
for(i=0; i<nn; i++)
sum+=f[i];
//--------------------------------------------------------

However, how can I do it in an elegant way if sum is an array itself and each entry of sum[i] might be accessed by different processors?
Is there any elegant standard way?

Many thanks for your help.
Markus
Markus
 

Re: Reduction for arrays

Postby ejd » Fri Jun 20, 2008 7:39 am

The usual methods of handling variables that can be accessed by multiple threads are the reduction clause, atomic, critical, and locks. If the variable is an array, then the reduction clause in C/C++ is not really of use. Currently OpenMP only defines array reduction for Fortran. The problem being that in C/C++ arrays are handled differently in the language depending on whether they are passed or they are locally declared. The other possibility, depending on the code, is to handle the reduction as a scalar and then assign the value into the array. This has the advantage of requiring less locking. If you would like to give an example of the code, then maybe someone might have a better suggestion.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: Reduction for arrays

Postby Markus » Fri Jun 20, 2008 7:59 am

Unfortunately it is not possible to reduce it to scalars that are added afterwards.
Essentially I am using the following code that I want to parallelize.

//--------------------------------------------------------
int i;
int ne=1000; //number of elements
int EFT[9]; //element freedom table
double *f=new double[nn]; //force vector

for(i=0; i<ne; i++)
{
EFT[0..8]=/*entries of EFT are a function of i*/
f[EFT[i]]+=/*a value is added that is computed from existing databases. note that, for example, f[3] might be accessed several times during the loop*/
}
//--------------------------------------------------------

Thanks for your help. I appreciate.
Markus
Markus
 

Re: Reduction for arrays

Postby Markus » Fri Jun 20, 2008 8:08 am

Sorry, got a bug in the code :oops: .
Thats what I meant:

//--------------------------------------------------------
int i;
int ne=1000; //number of elements
int EFT[9]; //element freedom table
double *f=new double[nn]; //force vector

for(i=0; i<ne; i++)
{
EFT[0..8]=/*entries of EFT are a function of i*/

f[EFT[0]]+=/*a value is added that is computed from existing databases. note that, for example, f[3] might be accessed several times during the loop*/
f[EFT[1]]+=/*another value is added that is computed from existing databases*/
.
.
f[EFT[8]]+=/*another value is added that is computed from existing databases*/
}
//--------------------------------------------------------
Markus
 

Re: Reduction for arrays

Postby ejd » Fri Jun 20, 2008 9:19 am

I was afraid that it would look something like this. The problem is that you end up locking around the assignments to make sure that the array elements are updated without a race. This really limits you to using locks or critical, but the lock area is so large that you really don't get a performance improvement. You can do your own reduction - which is less efficient than most OpenMP implementations of reduction (if it worked for arrays) - BUT it will only be useful if the amount of work in the calculations offsets the overhead of the setup and summation. For example:

Code: Select all
int i, j;
int ne=1000; //number of elements
int EFT[9]; //element freedom table
double *f=new double[nn]; //force vector

double *fpf=new double[nn];
for (i = 0; i < nn; i++)
  fpf[i] = 0;

#pragma omp parallel firstprivate(fpf) private(j)
{
  #pragma omp for
  for (i = 0; i < ne; i++) {
    ...
    fpf[EFT[0]] += ...
    ...
    fpf[EFT[8]] += ...
    ...
  } // end of for loop
  #pragma omp critical
  {
    for (j = 0; j < ne; j++)
      f[j] += fpf[j];
  }
}

I am doing this quickly so I hope I did it right. The idea is that you set up another array and zero it. Then by declaring it firstprivate, each thread gets its own copy. You then don't care about what element the EFT array addresses because it is private to that thread. At the end, you then have a critical section to sum up the values calculated by each thread. Unfortunately this region has to be protected, but at least it is just an addition for each element.

Like I said though, this really depends on whether the calculations you haven't shown consist of a lot of work or not. If they are a lot of work, then doing this work in parallel may offset the overhead of the additional work - array setup (new, zeroing, and firstprivate) and summation.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: Reduction for arrays

Postby Markus » Fri Jun 20, 2008 11:27 am

Hello ejd,

thanks for your help!
I had a look at the atomic clause. Are there any reasons why I should not use it for this particular problem?

Markus
Markus
 

Re: Reduction for arrays

Postby ejd » Fri Jun 20, 2008 12:03 pm

atomic applies only to the next statement. What you showed was:

Code: Select all
for(i=0; i<ne; i++)
{
  EFT[0..8]=/*entries of EFT are a function of i*/

  f[EFT[0]]+=/*a value is added that is computed from existing databases. note that, for example, f[3] might be accessed several times during the loop*/
  f[EFT[1]]+=/*another value is added that is computed from existing databases*/
   .
   .
  f[EFT[8]]+=/*another value is added that is computed from existing databases*/
}

So you would have to put an atomic around each of these statements, since EFT[0] could be equal EFT[1], or ..., EFT[8]. This would be almost as bad as a critical or lock for the whole group - and possibly worse. So unless I mis-understood your code (which is possible), I don't think it would help.

The other thing about atomic to note, is that an implementation may implement atomic using critical regions. They could also implement atomic with a compare and swap. The difference being that with a critical, essentially an update to the entire array is blocked. With a compare and swap type of implementation, only an array element needs to be locked. However, again depending on the hardware and implementation, that might speed up "uncontested" array element updates (i.e., different threads updating different elements), but slow down "contested" array element updates (i.e., multiple threads trying to update the same element).
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: Reduction for arrays

Postby Markus » Wed Aug 13, 2008 11:27 pm

You are right.
Just checked it and the speed up on an eight processor machine is just around 3.
Since my arrays are pretty large I will not be able to make eight copies of them. I guess I have no other choice than to use a domain decomposition.

Many thanks for your help.
Markus
Markus
 


Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot] and 10 guests