Whats the problem with this code?

General OpenMP discussion

Whats the problem with this code?

Postby Philipp » Mon Jun 23, 2008 3:46 am

Hi!

Can anyone explain me, why this piece of code is slower then without the "#pragma..." line? In htop it looks like all the cores are just used with 20% power in this funtion.
I use gcc 4.2 for compiling.

Thanks for all help!!!

Code: Select all
void Calculate(double q, double ni, double Qt[], int No,
                       double* Gammahat0, double* Gammahat1)
{
   double sqrtni,Xiarg;
   double Xi0,Xi1,dXi0,dXi1;
   int i;
   sqrtni = sqrt(ni);

   *Gammahat0 = 0.0;
   *Gammahat1 = 0.0;

   double g0 = 0.0;
   double g1 = 0.0;

#pragma omp parallel for firstprivate(sqrtni, q) private(Xi0,Xi1,dXi0,dXi1,Xiarg) reduction(+ : g0,g1)
   for (i=0; i<RFPeriodIntervalNo; i++)
   {
      Xiarg = (q-Qt[i])/sqrtni;
      CalculateXi(Xiarg,&Xi0,&Xi1,&dXi0,&dXi1);
      g0     += Xi0;
      g1     += Xi1;
   }

   g0     *= sqrtni/RFPeriodIntervalNo;
   g1     /= RFPeriodIntervalNo;
   
   *Gammahat0 = g0;
   *Gammahat1 = g1;
}
Philipp
 

Re: Whats the problem with this code?

Postby ejd » Mon Jun 23, 2008 4:50 am

You are going to have to give more information. What hardware and OS are you using? How much work is being done in the routine CalculateXi? How large is RFPeriodIntervalNo? With what little information you have given, it is impossible to even hazard a guess that would mean anything.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: Whats the problem with this code?

Postby Philipp » Mon Jun 23, 2008 4:54 am

Hi again,

RFPeriodintervalNo is about 80-150.
I have two quad xeons on the server. with ubuntu 64bit installed. The work of CalculateXi i don't know, sorry.
What i don't understand is, why the programm can be even slower with omp...
Philipp
 

Re: Whats the problem with this code?

Postby ejd » Mon Jun 23, 2008 5:21 am

OpenMP uses a fork-join model for parallelism. Most implementations are built on top of the pthreads library. The pragma translates into code that calls the underlying library to set up the threads, partition the work across the threads, and set up the memory associated with the threads. At the end of the parallel region, there is overhead associated with the joining of the threads and for accumulating values (i.e., reduction). All of this is the overhead of using parallel. If the work that you are doing in a parallel region is small, then this overhead can be large enough to offset any gain that is made from using parallel.

There are also quite a number of other problems that can cause using parallel to be more "expensive" than just doing the work sequentially. These include (but are not limited to) false sharing of the cache, cache thrashing, numa (memory) affects, etc. Your question is one of the reasons that parallel programming is quite a bit of fun. Hope that helps.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: Whats the problem with this code?

Postby Philipp » Tue Jun 24, 2008 1:44 am

Hi again,

Thank you, that's what i wanted to know.
I have got another question to the same problem. I just read about that, possibly it is the problem, that i use nested parallelism and the code i showed you runs parallel to the same code, so the "for" -loop in the code is the second level of parallelism.
In my main.c i set
Code: Select all
omp_set_nested(1);
. Any suggestions what i can try to do, to use that nested parallelism in the code example? (like omp_set_ ... !?).
Philipp
 

Re: Whats the problem with this code?

Postby ejd » Tue Jun 24, 2008 5:36 pm

Any suggestions what i can try to do, to use that nested parallelism in the code example? (like omp_set_ ... !?).

Sorry - but I am not sure I understand what you are asking. Please try again.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: Whats the problem with this code?

Postby Philipp » Tue Jun 24, 2008 11:44 pm

Hi again.
Meanwhile i am totally confused.
I switched off all parallel parts in my program and it became (much) faster.
So i only switched on the "main" parallel section, where i solve a differential equation twice (and independently), because i thought that would be a nice way to make the program run faster.
The effect was: the runtime raises from 54s to 180s.
Here is the part where i use openmp:
Code: Select all
void LoeseDGL(ergebnisstruct* ergebnis1, ergebnisstruct* ergebnis2)
{
#pragma omp parallel sections
   {
#pragma omp section
      {
         ACBerechneBohmpunkt(ergebnis1);                     
         ACloesen(ergebnis1);                        
      }
#pragma omp section
      {
         ACBerechneBohmpunkt(ergebnis2);
         ACloesen(ergebnis2);
      }
   }
}

The two structs ergebnis1 and ergebnis2 contain large arrays of doubles (like 1.5GB each), but are completely independent. So normally it should be fine. The two functions i call in the sections need a few seconds to run. So all those waiting times at the end of the parallel section should not be important compared to the runtime.
Sorry for the german words in the code and the bad english... :oops:
Philipp
 

Re: Whats the problem with this code?

Postby Philipp » Wed Jun 25, 2008 4:19 am

update:
possibly realloc in those parallel regions makes problems, because they are waiting for each other when trying to get memory?!
Philipp
 

Re: Whats the problem with this code?

Postby ejd » Wed Aug 13, 2008 11:20 pm

Unless your implementation is using a set of memory allocation routines that work in a parallel environment, it is quite likely that the realloc is causing the problem. You might try seeing if you can run the program without using realloc and see what the performance looks like then. The other thing to do would be to see if you can use a memory manager that scales better in a parallel programming environment and will work with whatever compiler and runtime you are using.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am


Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 9 guests