Tirple "for" bucle

General OpenMP discussion

Tirple "for" bucle

Postby Acherito » Mon Jun 25, 2012 11:35 pm

Hi all,

I am new on this forum and hope to learn a lot from it.
I was developing a software of calculation and I found that one of the biggest problem was the speed due to the non use of all the resources.
The piece of code is "slow" due to the code below that is a triple "for". As you can see, there is a "#pragma omp critical" commented. The result of the calculation is correct only one this line is not commented, but the speed increments high if I do it and the reduction in run-time is not good enough.
Someone knows why I have to use "#pragma omp critical" and if there is any way not to use it or a way to have clear reduction in run-time?
Code: Select all
void Projection(SunStruct Sun[], SunStruct Hel[], SunStruct Tar[], int sunAccu, int helAccu, int tarAccu, double tarEdge)
{
Coordenada un, vn;

int iSun, jSun, iHel, jHel, iTar, jTar, iMin, jMin;

int chunk = 1;
#pragma omp parallel default(none) private(iSun,jSun,iHel,iMin,jMin,un,vn) shared(Sun,Hel,Tar,sunAccu,helAccu,tarAccu,tarEdge)
{

    #pragma omp for nowait

    for(iSun = 0; iSun < sunAccu; iSun++){
       for(jSun = 0; jSun < sunAccu; jSun++){
          if(Sun[B(iSun,jSun,sunAccu)].rad != 0){
             for(iHel = 0; iHel < helAccu*helAccu; iHel++){
                Vectorun(un, Sun[B(iSun,jSun,sunAccu)].coord, Hel[iHel].coord); //Calculates un vector
                Vectorvn(vn, un, Hel[iHel].n); //Calculates vn vector
              
                NearestPoint(iMin, jMin, iHel, Hel, Tar, vn, tarAccu, tarEdge); //Calculates iMin and jMin point "nearest point"
                //#pragma omp critical
                Tar[B(iMin,jMin,tarAccu)].rad  += Sun[B(iSun,jSun,sunAccu)].rad;
             }
          }
       }
    }
}  /* end of parallel section */


Thank you for any possible help.

Acherito
Acherito
 
Posts: 4
Joined: Mon Jun 25, 2012 7:39 am

Re: Tirple "for" bucle

Postby MarkB » Tue Jun 26, 2012 4:19 am

You need the critical region because there must be a possibility that two or more threads try to update the same element of Tar concurrently, resulting a race condition.

You could try replacing the critical directive with an atomic directive: this may or may not help, depending on the compiler/hardware you are using.

Beyond this there are some other options which involve some rewriting of the code:

Add an OpenMP lock variable to the structure from which the Tar array is built, and surround each update to Tar with a lock/unlock of this.
or
Add a dimension of size the number of threads to the Tar array: use the thread ID to index into this array for the updates, and then sum
all the contributions together after the end of the parallel loop.
MarkB
 
Posts: 428
Joined: Thu Jan 08, 2009 10:12 am

Re: Tirple "for" bucle

Postby Acherito » Tue Jun 26, 2012 6:58 am

Thank you MarkB,

I was trying just exactly what you are saying so I wasn't totally wrong.
I have seen that with atomic directive works a little bit better, and about the second option I was doing something similar to the code above. I have a little proble with the memory allocation but it seems that it is a possibility. Do you thing that the following is a good way to continue? As you will see, I store the values in TarRad for each iSun array ID and I have also reduce one "for" of Sun array.

Code: Select all
void Projection(SunStruct Sun[], SunStruct Hel[], SunStruct Tar[], int sunAccu, int helAccu, int tarAccu, double tarEdge)
    {
    Coordenada un, vn;

    int iSun, iHel, iTar, jTar, iMin, jMin;

   double *TarRad = (double *)malloc(sunAccu*sunAccu*helAccu*helAccu*tarAccu*tarAccu*sizeof(double));

    int chunk = 1;
    #pragma omp parallel default(none) private(iSun,jSun,iHel,iMin,jMin,un,vn) shared(Sun,Hel,Tar,sunAccu,helAccu,tarAccu,tarEdge)
    {

        #pragma omp for nowait

        for(iSun = 0; iSun < sunAccu*sunAccu; iSun++){
              if(Sun[iSun].rad != 0){
                 for(iHel = 0; iHel < helAccu*helAccu; iHel++){
                    Vectorun(un, Sun[iSun].coord, Hel[iHel].coord); //Calculates un vector
                    Vectorvn(vn, un, Hel[iHel].n); //Calculates vn vector
                 
                    NearestPoint(iMin, jMin, iHel, Hel, Tar, vn, tarAccu, tarEdge); //Calculates iMin and jMin point "nearest point"
                    //#pragma omp atomic
                    //Tar[B(iMin,jMin,tarAccu)].rad  += Sun[iSun].rad;
                    TarRad[A(iSun,iHel,iMin,jMin,sunAccu*sunAccu,helAccu*helAccu,tarAccu)] = Sun[iSun].rad;
                 }
              }
           }
        }
    }  /* end of parallel section */

        for(iSun = 0; iSun < sunAccu*sunAccu; iSun++){
      for(iHel = 0; iHel < helAccu*helAccu; iHel++){
         for(iMin = 0; iMin < tarAccu; iMin++){
            for(jMin = 0; jMin < tarAccu; jMin++){
               Tar[B(iMin,jMin,tarAccu)].rad  += TarRad[A(iSun,iHel,iMin,jMin,sunAccu*sunAccu,helAccu*helAccu,tarAccu)];
            }
         }
      }
   }



Thank you.

Acherito
Acherito
 
Posts: 4
Joined: Mon Jun 25, 2012 7:39 am

Re: Tirple "for" bucle

Postby MarkB » Tue Jun 26, 2012 8:42 am

Acherito wrote:Do you thing that the following is a good way to continue? As you will see, I store the values in TarRad for each iSun array ID and I have also reduce one "for" of Sun array.


I suspect you are using more memory than you need to (one copy of the Tar array per thread is enough), and that the extra (sequential) loop may consume quite a bit of time...
MarkB
 
Posts: 428
Joined: Thu Jan 08, 2009 10:12 am

Re: Tirple "for" bucle

Postby Acherito » Tue Jun 26, 2012 8:58 am

You are right. It consumes a little bit of time, but anyway it is faster doing it in this way (I have done a speed test).
And about the memory, this is a very specific software for a test purpose so I don't mind too much about memory in this case.
I don't have clear what you mean with "(one copy of the Tar array per thread is enough)". I haven't got clear how to do it. Can you explain me with a little examlpe?
Thank you.

Acherito
Acherito
 
Posts: 4
Joined: Mon Jun 25, 2012 7:39 am

Re: Tirple "for" bucle

Postby MarkB » Tue Jun 26, 2012 10:36 am

Can you explain me with a little examlpe?


Sure: here's a sketch of how to do it.

Code: Select all
void Projection(SunStruct Sun[], SunStruct Hel[], SunStruct Tar[], int sunAccu, int helAccu, int tarAccu, double tarEdge)
    {
    Coordenada un, vn;

    int iSun, jSun, iHel, jHel, iTar, jTar, iMin, jMin;

    int nthreads = omp_get_max_threads();
   //allocate TempTarRad of size nthreads * size of Tar[]  and of type whatever Tar[].rad is here......

    int chunk = 1;
    #pragma omp parallel default(none) private(iSun,jSun,iHel,iMin,jMin,un,vn) shared(Sun,Hel,Tar,TempTarRad,sunAccu,helAccu,tarAccu,tarEdge)
    {
         int myid = omp_get_thread_num();
         int nth = omp_get_num_threads();
         // initialise TempTarRad[myid][] to zero here...

        #pragma omp for

        for(iSun = 0; iSun < sunAccu; iSun++){
           for(jSun = 0; jSun < sunAccu; jSun++){
              if(Sun[B(iSun,jSun,sunAccu)].rad != 0){
                 for(iHel = 0; iHel < helAccu*helAccu; iHel++){
                    Vectorun(un, Sun[B(iSun,jSun,sunAccu)].coord, Hel[iHel].coord); //Calculates un vector
                    Vectorvn(vn, un, Hel[iHel].n); //Calculates vn vector
                 
                    NearestPoint(iMin, jMin, iHel, Hel, Tar, vn, tarAccu, tarEdge); //Calculates iMin and jMin point "nearest point"

                    TempTarRad[myid][B(iMin,jMin,tarAccu)]  += Sun[B(iSun,jSun,sunAccu)].rad;
                 }
              }
            }
          } // need barrier here, so no nowait!

     #pragma omp for
          for (int i=0; i<N; i++) {    // N is size of Tar[]
               for (int j = 0; j<nth; j++){
                Tar[i].rad  += TempTarRad[j][i];
               }
           }
       
    }  /* end of parallel section */
MarkB
 
Posts: 428
Joined: Thu Jan 08, 2009 10:12 am

Re: Tirple "for" bucle

Postby Acherito » Wed Jun 27, 2012 2:52 am

Hi MarkB,

It works. I have done a test of run-time and without OpenMP it takes around 25 sec, with "#pragma omp atomic" as the previous code I show it takes around 16 sec and with this last code below that is what you cemmented in your las post it takes around 11 sec. So... good job and thank you very much. Not only for this case, now I understand clearly how OpenMP works with each thread.
I'm sure that still is possible to reduce more, but for this purpose I think that it is enough good.

Code: Select all
void Projection(SunStruct Sun[], SunStruct Hel[], SunStruct Tar[], int sunAccu, int helAccu, int tarAccu, double PowEfec, double tarEdge)
{
   Coordenada un, vn;
   int i, j, iSun, iHel, jHel, iTar, jTar, iMin, jMin;
   int nthreads = omp_get_max_threads();
   int numTar = tarAccu * tarAccu;
   double TempTarRad[nthreads][numTar];

      for (int i = 0; i < numTar; i++) {
         for (int j = 0; j < nthreads; j++){
            TempTarRad[j][i] = 0;         // initialise TempTarRad[myid][] to zero here...
         }
      }

   #pragma omp parallel shared(sunAccu,helAccu,tarAccu,tarEdge)
   {
      int myid = omp_get_thread_num();

      #pragma omp for private(iSun,iHel,iMin,jMin,un,vn) firstprivate(Sun,Hel,Tar)
         //firstprivate = All variables in the list are initialized with the value the original object had before entering

         for(iSun = 0; iSun < sunAccu*sunAccu; iSun++){
            if(Sun[iSun].rad != 0){
               for(iHel = 0; iHel < helAccu*helAccu; iHel++){
                  Vectorun(un, Sun[iSun].coord, Hel[iHel].coord);
                  Vectorvn(vn, un, Hel[iHel].n);

                  NearestPoint(iMin, jMin, iHel, Hel, Tar, vn, tarAccu, tarEdge);

                  TempTarRad[myid][B(iMin,jMin,tarAccu)] += Sun[iSun].rad;

               } // end of for iHel
            } // end of if
         } // end of for iSun

      #pragma omp for
         for (int i = 0; i < numTar; i++) {
            for (int j = 0; j < nthreads; j++){
               Tar[i].rad  += TempTarRad[j][i];
            }
         }
   }  /* end of parallel section */


The final result is the same so the calculation is correct.

Acherito
Acherito
 
Posts: 4
Joined: Mon Jun 25, 2012 7:39 am

Re: Tirple "for" bucle

Postby MarkB » Wed Jun 27, 2012 3:51 am

Acherito wrote:So... good job and thank you very much.


You're welcome!
MarkB
 
Posts: 428
Joined: Thu Jan 08, 2009 10:12 am


Return to Using OpenMP

Who is online

Users browsing this forum: Yahoo [Bot] and 8 guests