## Parallelize a sum over certain parts of an array

General OpenMP discussion

### Parallelize a sum over certain parts of an array

Hello:

I need (well, the actual job is a bit more elaborated, but I simplify as example) to sum certain elements of an array in certain number of groups. I think it could be parallelized via OpenMP, but I don't know how to deal with the problem. First, I paste the example code:
Code: Select all
`#include<stdio.h>#if defined(_OPENMP)#include<omp.h>#endif#define N 14#define G 4//grup 0 sums 30.0, grup 1 sums  7.0, grup 2 sums 30.0, and grup 3 sums 38.0int main(){    size_t i=0,pos=0;    size_t p[]={  0,  1,  0,  0,  1,  2,  2,  2,  2,   0,   3,   0,   3,   3};    double a[]={1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0};    double s[]={0.0,0.0,0.0,0.0};#if defined(_OPENMP)#pragma omp parallel for default(none) schedule(dynamic) \ shared(p,s,a) private(i,pos)#endif    for(i=0;i<N;i++){        pos = p[i];        s[pos] += a[i];    }    for(i=0;i<G;i++)        fprintf(stdout,"%4.1lf\n",s[i]);    return 0;}`

The array a contains the original data, which must be added in groups. The array p contains the group identifier (the position in the array s) to which each element in a belongs, and the array s contains the final sums for each group. The problem is solved via a for loop over a, where the group index is extracted from p.

Obviously, the OpenMP code in the example is wrong because the position s[pos] can be updated independently by each working thread and two or more thread could be working with the same group identifier at the same tame. My first idea was to use the reduction() clause in the way
Code: Select all
`reduction(+:s[pos])`

but this use of reduction() seems to be illegal
Code: Select all
`\$ gcc -fopenmp hola.c hola.c: In function ‘main’:hola.c:16:44: error: expected ‘)’ before ‘[’ token  shared(p,s,a) private(i,pos) reduction(+:s[pos])                                            ^hola.c:16:40: error: user defined reduction not found for ‘s’  shared(p,s,a) private(i,pos) reduction(+:s[pos])                                        ^`

How could I solve the problem?

UPDATE 1:

I think a possible solution could be to use the critical clause inside the for loop in the way
Code: Select all
`#pragma omp criticals[pos] += a[i];`

but, will be then the code the same as the serial version? Could we then obtain some advantage?

UPDATE 2:

Another possibility could be the atomic clause as
Code: Select all
`#pragma omp atomics[pos] += a[i];`

Is there any difference between the use of critical and atomic?

Thanks
jgpallero

Posts: 13
Joined: Fri May 11, 2012 1:47 am

### Re: Parallelize a sum over certain parts of an array

Hi,

I think both of your solutions are fine (I do not recall if any restriction on atomic applies... but the compiler should know better).

UPDATE 1:

I think a possible solution could be to use the critical clause inside the for loop in the way

Code: Select all
#pragma omp critical
s[pos] += a[i];

but, will be then the code the same as the serial version? Could we then obtain some advantage?

The code is not the same as the serial version, threads only synchronize at

s[pos] += a[i];

and work asynchronously for the rest (right, there is not much else to do in the example, but maybe there are other things to do in the real example). I would not use schedule(dynamic) unless the real example contains some work dependency on the iteration number.

HTH,

Fernando.
ftinetti

Posts: 603
Joined: Wed Feb 10, 2010 2:44 pm

### Re: Parallelize a sum over certain parts of an array

Using atomic has potentially better performance that using critical, since it permits the possibility of updates to different elements of s[] to be done in parallel, though whether this actually happens depends on the ability of the compiler to generate optimised code for this situation.

You cannot use C arrays in a reduction clause, but you can implement this "by hand" as follows:

Code: Select all
`#pragma omp parallel default(none) shared(p,s,a,stemp) private(i,j,id,pos,myid,nthreads){    myid = omp_get_thread_num();    nthreads= omp_get_num_threads();    for (j=0;j<G;j++) {        stemp[myid][j] = 0.0;     } #pragma omp for     for(i=0;i<N;i++){        pos = p[i];        stemp[myid][pos] += a[i];    }#pragma omp for   //  #pragma omp master might be faster if G is small     for (j=0;j<G;j++) {        for (id=0;id<nthreads;id++){            s[j] += stemp[id][j];         }    }}`

where stemp is suitably allocated before the parallel region.
This may be faster than using atomics, especially if G is small compared to N.
MarkB

Posts: 672
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

### Re: Parallelize a sum over certain parts of an array

ftinetty and MarkB, thank you very much for your comments and suggestions
jgpallero

Posts: 13
Joined: Fri May 11, 2012 1:47 am