Parallelize a sum over certain parts of an array

General OpenMP discussion

Parallelize a sum over certain parts of an array

Postby jgpallero » Thu Jun 26, 2014 3:05 am

Hello:

I need (well, the actual job is a bit more elaborated, but I simplify as example) to sum certain elements of an array in certain number of groups. I think it could be parallelized via OpenMP, but I don't know how to deal with the problem. First, I paste the example code:
Code: Select all
#include<stdio.h>
#if defined(_OPENMP)
#include<omp.h>
#endif
#define N 14
#define G 4
//grup 0 sums 30.0, grup 1 sums  7.0, grup 2 sums 30.0, and grup 3 sums 38.0
int main()
{
    size_t i=0,pos=0;
    size_t p[]={  0,  1,  0,  0,  1,  2,  2,  2,  2,   0,   3,   0,   3,   3};
    double a[]={1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0};
    double s[]={0.0,0.0,0.0,0.0};
#if defined(_OPENMP)
#pragma omp parallel for default(none) schedule(dynamic) \
shared(p,s,a) private(i,pos)
#endif
    for(i=0;i<N;i++){
        pos = p[i];
        s[pos] += a[i];
    }
    for(i=0;i<G;i++)
        fprintf(stdout,"%4.1lf\n",s[i]);
    return 0;
}


The array a contains the original data, which must be added in groups. The array p contains the group identifier (the position in the array s) to which each element in a belongs, and the array s contains the final sums for each group. The problem is solved via a for loop over a, where the group index is extracted from p.

Obviously, the OpenMP code in the example is wrong because the position s[pos] can be updated independently by each working thread and two or more thread could be working with the same group identifier at the same tame. My first idea was to use the reduction() clause in the way
Code: Select all
reduction(+:s[pos])


but this use of reduction() seems to be illegal
Code: Select all
$ gcc -fopenmp hola.c
hola.c: In function ‘main’:
hola.c:16:44: error: expected ‘)’ before ‘[’ token
  shared(p,s,a) private(i,pos) reduction(+:s[pos])
                                            ^
hola.c:16:40: error: user defined reduction not found for ‘s’
  shared(p,s,a) private(i,pos) reduction(+:s[pos])
                                        ^


How could I solve the problem?

UPDATE 1:

I think a possible solution could be to use the critical clause inside the for loop in the way
Code: Select all
#pragma omp critical
s[pos] += a[i];


but, will be then the code the same as the serial version? Could we then obtain some advantage?

UPDATE 2:

Another possibility could be the atomic clause as
Code: Select all
#pragma omp atomic
s[pos] += a[i];


Is there any difference between the use of critical and atomic?

Thanks
jgpallero
 
Posts: 11
Joined: Fri May 11, 2012 1:47 am

Re: Parallelize a sum over certain parts of an array

Postby ftinetti » Thu Jun 26, 2014 8:41 am

Hi,

I think both of your solutions are fine (I do not recall if any restriction on atomic applies... but the compiler should know better).

UPDATE 1:

I think a possible solution could be to use the critical clause inside the for loop in the way

Code: Select all
#pragma omp critical
s[pos] += a[i];

but, will be then the code the same as the serial version? Could we then obtain some advantage?


The code is not the same as the serial version, threads only synchronize at

s[pos] += a[i];

and work asynchronously for the rest (right, there is not much else to do in the example, but maybe there are other things to do in the real example). I would not use schedule(dynamic) unless the real example contains some work dependency on the iteration number.

HTH,

Fernando.
ftinetti
 
Posts: 581
Joined: Wed Feb 10, 2010 2:44 pm

Re: Parallelize a sum over certain parts of an array

Postby MarkB » Thu Jun 26, 2014 10:00 am

Using atomic has potentially better performance that using critical, since it permits the possibility of updates to different elements of s[] to be done in parallel, though whether this actually happens depends on the ability of the compiler to generate optimised code for this situation.

You cannot use C arrays in a reduction clause, but you can implement this "by hand" as follows:

Code: Select all
#pragma omp parallel default(none) shared(p,s,a,stemp) private(i,j,id,pos,myid,nthreads)
{

    myid = omp_get_thread_num();
    nthreads= omp_get_num_threads();

    for (j=0;j<G;j++) {
        stemp[myid][j] = 0.0;
    }

#pragma omp for
    for(i=0;i<N;i++){
        pos = p[i];
        stemp[myid][pos] += a[i];
    }

#pragma omp for   
//  #pragma omp master might be faster if G is small
    for (j=0;j<G;j++) {
        for (id=0;id<nthreads;id++){
            s[j] += stemp[id][j];
        }
    }

}

where stemp is suitably allocated before the parallel region.
This may be faster than using atomics, especially if G is small compared to N.
MarkB
 
Posts: 434
Joined: Thu Jan 08, 2009 10:12 am

Re: Parallelize a sum over certain parts of an array

Postby jgpallero » Thu Jun 26, 2014 5:40 pm

ftinetty and MarkB, thank you very much for your comments and suggestions
jgpallero
 
Posts: 11
Joined: Fri May 11, 2012 1:47 am


Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 16 guests