Tirple "for" bucle

General OpenMP discussion

Tirple "for" bucle

Hi all,

I am new on this forum and hope to learn a lot from it.
I was developing a software of calculation and I found that one of the biggest problem was the speed due to the non use of all the resources.
The piece of code is "slow" due to the code below that is a triple "for". As you can see, there is a "#pragma omp critical" commented. The result of the calculation is correct only one this line is not commented, but the speed increments high if I do it and the reduction in run-time is not good enough.
Someone knows why I have to use "#pragma omp critical" and if there is any way not to use it or a way to have clear reduction in run-time?
Code: Select all
void Projection(SunStruct Sun[], SunStruct Hel[], SunStruct Tar[], int sunAccu, int helAccu, int tarAccu, double tarEdge)
{

int iSun, jSun, iHel, jHel, iTar, jTar, iMin, jMin;

int chunk = 1;
#pragma omp parallel default(none) private(iSun,jSun,iHel,iMin,jMin,un,vn) shared(Sun,Hel,Tar,sunAccu,helAccu,tarAccu,tarEdge)
{

#pragma omp for nowait

for(iSun = 0; iSun < sunAccu; iSun++){
for(jSun = 0; jSun < sunAccu; jSun++){
for(iHel = 0; iHel < helAccu*helAccu; iHel++){
Vectorun(un, Sun[B(iSun,jSun,sunAccu)].coord, Hel[iHel].coord); //Calculates un vector
Vectorvn(vn, un, Hel[iHel].n); //Calculates vn vector

NearestPoint(iMin, jMin, iHel, Hel, Tar, vn, tarAccu, tarEdge); //Calculates iMin and jMin point "nearest point"
//#pragma omp critical
}
}
}
}
}  /* end of parallel section */

Thank you for any possible help.

Acherito
Acherito

Posts: 4
Joined: Mon Jun 25, 2012 7:39 am

Re: Tirple "for" bucle

You need the critical region because there must be a possibility that two or more threads try to update the same element of Tar concurrently, resulting a race condition.

You could try replacing the critical directive with an atomic directive: this may or may not help, depending on the compiler/hardware you are using.

Beyond this there are some other options which involve some rewriting of the code:

Add an OpenMP lock variable to the structure from which the Tar array is built, and surround each update to Tar with a lock/unlock of this.
or
Add a dimension of size the number of threads to the Tar array: use the thread ID to index into this array for the updates, and then sum
all the contributions together after the end of the parallel loop.
MarkB

Posts: 559
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: Tirple "for" bucle

Thank you MarkB,

I was trying just exactly what you are saying so I wasn't totally wrong.
I have seen that with atomic directive works a little bit better, and about the second option I was doing something similar to the code above. I have a little proble with the memory allocation but it seems that it is a possibility. Do you thing that the following is a good way to continue? As you will see, I store the values in TarRad for each iSun array ID and I have also reduce one "for" of Sun array.

Code: Select all
void Projection(SunStruct Sun[], SunStruct Hel[], SunStruct Tar[], int sunAccu, int helAccu, int tarAccu, double tarEdge)
{

int iSun, iHel, iTar, jTar, iMin, jMin;

int chunk = 1;
#pragma omp parallel default(none) private(iSun,jSun,iHel,iMin,jMin,un,vn) shared(Sun,Hel,Tar,sunAccu,helAccu,tarAccu,tarEdge)
{

#pragma omp for nowait

for(iSun = 0; iSun < sunAccu*sunAccu; iSun++){
for(iHel = 0; iHel < helAccu*helAccu; iHel++){
Vectorun(un, Sun[iSun].coord, Hel[iHel].coord); //Calculates un vector
Vectorvn(vn, un, Hel[iHel].n); //Calculates vn vector

NearestPoint(iMin, jMin, iHel, Hel, Tar, vn, tarAccu, tarEdge); //Calculates iMin and jMin point "nearest point"
//#pragma omp atomic
}
}
}
}
}  /* end of parallel section */

for(iSun = 0; iSun < sunAccu*sunAccu; iSun++){
for(iHel = 0; iHel < helAccu*helAccu; iHel++){
for(iMin = 0; iMin < tarAccu; iMin++){
for(jMin = 0; jMin < tarAccu; jMin++){
}
}
}
}

Thank you.

Acherito
Acherito

Posts: 4
Joined: Mon Jun 25, 2012 7:39 am

Re: Tirple "for" bucle

Acherito wrote:Do you thing that the following is a good way to continue? As you will see, I store the values in TarRad for each iSun array ID and I have also reduce one "for" of Sun array.

I suspect you are using more memory than you need to (one copy of the Tar array per thread is enough), and that the extra (sequential) loop may consume quite a bit of time...
MarkB

Posts: 559
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: Tirple "for" bucle

You are right. It consumes a little bit of time, but anyway it is faster doing it in this way (I have done a speed test).
And about the memory, this is a very specific software for a test purpose so I don't mind too much about memory in this case.
I don't have clear what you mean with "(one copy of the Tar array per thread is enough)". I haven't got clear how to do it. Can you explain me with a little examlpe?
Thank you.

Acherito
Acherito

Posts: 4
Joined: Mon Jun 25, 2012 7:39 am

Re: Tirple "for" bucle

Can you explain me with a little examlpe?

Sure: here's a sketch of how to do it.

Code: Select all
void Projection(SunStruct Sun[], SunStruct Hel[], SunStruct Tar[], int sunAccu, int helAccu, int tarAccu, double tarEdge)
{

int iSun, jSun, iHel, jHel, iTar, jTar, iMin, jMin;

int chunk = 1;
#pragma omp parallel default(none) private(iSun,jSun,iHel,iMin,jMin,un,vn) shared(Sun,Hel,Tar,TempTarRad,sunAccu,helAccu,tarAccu,tarEdge)
{
// initialise TempTarRad[myid][] to zero here...

#pragma omp for

for(iSun = 0; iSun < sunAccu; iSun++){
for(jSun = 0; jSun < sunAccu; jSun++){
for(iHel = 0; iHel < helAccu*helAccu; iHel++){
Vectorun(un, Sun[B(iSun,jSun,sunAccu)].coord, Hel[iHel].coord); //Calculates un vector
Vectorvn(vn, un, Hel[iHel].n); //Calculates vn vector

NearestPoint(iMin, jMin, iHel, Hel, Tar, vn, tarAccu, tarEdge); //Calculates iMin and jMin point "nearest point"

}
}
}
} // need barrier here, so no nowait!

#pragma omp for
for (int i=0; i<N; i++) {    // N is size of Tar[]
for (int j = 0; j<nth; j++){
}
}

}  /* end of parallel section */
MarkB

Posts: 559
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: Tirple "for" bucle

Hi MarkB,

It works. I have done a test of run-time and without OpenMP it takes around 25 sec, with "#pragma omp atomic" as the previous code I show it takes around 16 sec and with this last code below that is what you cemmented in your las post it takes around 11 sec. So... good job and thank you very much. Not only for this case, now I understand clearly how OpenMP works with each thread.
I'm sure that still is possible to reduce more, but for this purpose I think that it is enough good.

Code: Select all
void Projection(SunStruct Sun[], SunStruct Hel[], SunStruct Tar[], int sunAccu, int helAccu, int tarAccu, double PowEfec, double tarEdge)
{
int i, j, iSun, iHel, jHel, iTar, jTar, iMin, jMin;
int numTar = tarAccu * tarAccu;

for (int i = 0; i < numTar; i++) {
for (int j = 0; j < nthreads; j++){
}
}

#pragma omp parallel shared(sunAccu,helAccu,tarAccu,tarEdge)
{

#pragma omp for private(iSun,iHel,iMin,jMin,un,vn) firstprivate(Sun,Hel,Tar)
//firstprivate = All variables in the list are initialized with the value the original object had before entering

for(iSun = 0; iSun < sunAccu*sunAccu; iSun++){
for(iHel = 0; iHel < helAccu*helAccu; iHel++){
Vectorun(un, Sun[iSun].coord, Hel[iHel].coord);
Vectorvn(vn, un, Hel[iHel].n);

NearestPoint(iMin, jMin, iHel, Hel, Tar, vn, tarAccu, tarEdge);

} // end of for iHel
} // end of if
} // end of for iSun

#pragma omp for
for (int i = 0; i < numTar; i++) {
for (int j = 0; j < nthreads; j++){
}
}
}  /* end of parallel section */

The final result is the same so the calculation is correct.

Acherito
Acherito

Posts: 4
Joined: Mon Jun 25, 2012 7:39 am

Re: Tirple "for" bucle

Acherito wrote:So... good job and thank you very much.

You're welcome!
MarkB

Posts: 559
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh