Speedup with 1 Core better than with 4 Core

General OpenMP discussion

Speedup with 1 Core better than with 4 Core

Postby Caroline » Wed Dec 07, 2011 7:10 am

Hi everyone,
Ive started using OpenMP since a couple of days and I'm
facing something strange. The goal is to calculate the variance of a 256x256 array.
With 1 core the elapsed time for the calculation is 0.39ms and for 4 core 1.63 ms!!!
I must be doing something wrong but I can not figure out what. I'm using Visual Studio 2008 Pro
on a Windows XP machine.
Here is how my code looks like:

Code: Select all
#include <cstdlib>
#include <ctime>
#include <omp.h>
#include <iostream>
#include <cmath>

int main ()
{
   //Initialize the random generator random
   srand(static_cast<unsigned>(time(NULL)));

   int n=256,x,y;
   double XMax=57.7, XMin= -10.3,Sum1=0.0,Sum2=0.0,Variance;
   double range=XMax-XMin;
   double U_0[256][256];

               //Populating array U_0
    ....
  //Measuring the time elapsed to calculate the variance
  double t=omp_get_wtime();
#pragma omp parallel private(x,y)
{
  #pragma omp for schedule(dynamic) reduction(+:Sum1,Sum2)
   for (x=0; x<n; x++)
   {
         for (y=0; y<n; y++)
          {
             #pragma omp atomic
             Sum1+= U_0[x][y] * U_0[x][y];
                          #pragma omp atomic
             Sum2+= U_0[x][y];
          }
   }
}
   Variance=(Sum1/(n*n)) - ((Sum2/(n*n))*(Sum2/(n*n)));
   //Time needed for the calculation of the variance
   printf_s("Time elapsed in milliseconds: %2lf ms\n", 1000*(omp_get_wtime()-t));
      return 0;
}

Could you please give a hint what I might be doing wrong?
Thanks

Caroline
Caroline
 
Posts: 3
Joined: Wed Dec 07, 2011 4:57 am

Re: Speedup with 1 Core better than with 4 Core

Postby ftinetti » Wed Dec 07, 2011 7:30 am

Hi,

You should not use atomic for controlling/synchronizing the update of reduction variables.

I would not use schedule(dynamic) for this kind of for.

HTH.
ftinetti
 
Posts: 581
Joined: Wed Feb 10, 2010 2:44 pm

Re: Speedup with 1 Core better than with 4 Core

Postby Caroline » Wed Dec 07, 2011 7:44 am

I tried :
# pragma omp critical instead of # pragma omp critical
and the elapsed was even worse ( more than 80 ms) :( .
I've also tried other schedule combinations or even without using schedule
and the elapsed time is still more than 70ms ...
Has someone an idea ??

Thanks

Caroline
Caroline
 
Posts: 3
Joined: Wed Dec 07, 2011 4:57 am

Re: Speedup with 1 Core better than with 4 Core

Postby ftinetti » Wed Dec 07, 2011 7:54 am

Hmmm... try just without synchronizing. i.e.

Code: Select all
...
         for (y=0; y<n; y++)
          {
             Sum1+= U_0[x][y] * U_0[x][y];
             Sum2+= U_0[x][y];
          }
...


HTH.
ftinetti
 
Posts: 581
Joined: Wed Feb 10, 2010 2:44 pm

Re: Speedup with 1 Core better than with 4 Core

Postby Caroline » Wed Dec 07, 2011 8:50 am

I meant in my previous post that I tried:
#pragma omp critical instead of # pragma omp atomic...

Without synchronising (and without using schedule) the elapsed time for the calculation is now 0.41 ms
which means that the time elapsed with using only 1 core is still less (0.39ms) ...
Temporary storing Sum1 and Sum2 in variables won't improve the time right ?
Thanks

Caroline
Caroline
 
Posts: 3
Joined: Wed Dec 07, 2011 4:57 am

Re: Speedup with 1 Core better than with 4 Core

Postby ftinetti » Wed Dec 07, 2011 9:31 am

Without synchronising (and without using schedule) the elapsed time for the calculation is now 0.41 ms
which means that the time elapsed with using only 1 core is still less (0.39ms) ...

Yes, I think so, I did not realize the under-millisecond time measurement, in that context, thread overheads penalizes performance too much. Just as an exercise, and in fact to verify the "rule", try with a greater value for "n".

Some other not so important details:
Processor-#cores:
OS:
RAM:
Compiler and compiler options:
ftinetti
 
Posts: 581
Joined: Wed Feb 10, 2010 2:44 pm

Re: Speedup with 1 Core better than with 4 Core

Postby waseem » Sat Dec 17, 2011 10:33 pm

Same advice. Try with a larger value of n.

A point to remember is that not every parallelization yields a speedup. Both communication time (between threads and/or processors) AND computation time need to be considered for a speedup. In cases where communication time dominates, the performance with a larger number of cores may be counter productive.
waseem
 
Posts: 3
Joined: Sat Sep 03, 2011 1:42 am


Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot] and 4 guests