Hi, I am trying to write a program of which its purpose is to find steady-state temperatures using jacobi iterative method in a 2D environment.

The core function in my program that I want it to run in parallel is shown below, it's written in C:

- Code: Select all
`int findSteadyState (double **u, double **w)`

{

double diff; /* Maximum temperature difference */

int its; /* Iteration count */

int i, j;

omp_set_num_threads(4);

for (its = 0; its < max_its; its++) {

diff = 0.0;

#pragma omp parallel for shared(diff,M,N,w,u) private(i,j) schedule(static,50)

for (i = 1; i < M-1; i++) {

for (j = 1; j < N-1; j++) {

w[i][j] = 0.25 * (u[i-1][j] + u[i+1][j] + u[i][j-1] + u[i][j+1]);

if (fabs(w[i][j] - u[i][j]) > diff)

diff = fabs(w[i][j] - u[i][j]);

}

}

#pragma omp parallel for shared(M,N,w,u) private(i,j) schedule(static,50)

for (i = 1; i < M-1; i++)

for (j = 1; j < N-1; j++)

u[i][j] = w[i][j];

/* Terminate if temperatures have converged */

if (diff <= 0.001)

break;

}

final_diff = diff;

return its;

}

Both M,N are equal to 200.

double **u is the array of temperatures calculated from the previous iteration.

double **w is the array of temperatures computed in the current iteration.

diff stores the largest temperature difference in the current iteration

if it has found that the largest temperature diff is smaller than 0.001

it's considered to have converged and then leave the loop and finally returns the total number of iterations.

To speed up the program a little bit, I tried to add some OpenMP directives into the functions.

As you can see, I added two work-sharing for-loop contructs.

But it turns out that the program sometimes gives correct results and sometimes wrong results, varying with different type schedule and chunk sizes.

For example, if I use the static schedule type with chunk size 50, it always gives correct results (It has been tested for more than 20 times).

However, if I use the dynamic schedule type with the chunk size 1, it always gives wrong results.

The correct results I mentioned are obtained from the sequential version of this function. And the sequential version of this function is in fact, just without the two work-sharing for-loop directives that I added.

I have been stuck for a few days and have no idea on what makes such a strange behaviour.

So would somebody please point me to the right direction?

Thanks so much for your help.