I have a very strange problem that I'm try to solve and understand. I have a nested for loop of the following form:
- Code: Select all
#pragma omp parallel for schedule(guided) shared(Array) collapse(3)
for (int i=istart; i<iend; i++)
for (int j=jstart; j<jend; j++)
for(int k=kstart; k<kend; k++)
Int IJK = (i*(jend-jstart) + (j-jstart))*(kend-kstart) + (k-kstart);
Array[3*IJK + 2] = an operation with some shared values;
There are three loops of this form, with Array[3*IJK] , Array[3*IJK + 1] and Array[3*IJK+2] respectively. Array is also actually a shared pointer and for the value of IJK, a function is actually called (inlined).
I first tried parallelizing all loops and the program runs through, but the results are different compared to my serial results.
Now come the strange parts.
The for loops that are of this same structure, but have Array[3*IJK + 1] and Array[3*IJK ] instead, produce correct results when parallelized (the other loop is serial in this case). But as soon as I parallelize the Array[3*IJK+2] loop, I get different results.
Also, If I don't use collapse, or collapse(2) instead of collapse(3), I get different results. Only with the #pragma statement as above, I get correct results in the Array[3*IJK + 1] and Array[3*IJK ] loops.
I thought it might have something to do with the order in which Array was written to, but with an ordered clause and construct, I still get wrong results.
With num_threads(1) I get wrong results as well.
What can be the cause of this?