variation in execution times

General OpenMP discussion

variation in execution times

Postby kook » Wed Oct 26, 2011 2:01 pm

I've seen other posts regarding variability in execution times but it wasn't clear to me what the solution was.
1) using gcc version 4.1.2 on a RedHat Linux distribution 5.3
2) my code is making use of omp_get_wtime()
3) my code is below

#include <stdlib.h>
#include <stdio.h>
#include <omp.h>
#include <unistd.h>

#define N 100000000

int
main(int argc, char **argv)
{
int i, *a;
long long sum = 0;
double t;
a = (int*)malloc(N*sizeof(int));
t = omp_get_wtime();
#pragma omp parallel for
for (i=0; i<N; i++) {
a[i] = i;
}

#pragma omp parallel for reduction(+:sum)
for (i=0; i<N; i++) {
sum += a[i];
}
printf ("sum = %lld, t = %f\n", sum, omp_get_wtime() - t);

return 0;
}

The variability in execution times are shown below for a quad core system
sum = 4999999950000000, t = 1.297301
sum = 4999999950000000, t = 0.701856
sum = 4999999950000000, t = 1.451137
sum = 4999999950000000, t = 0.697694
sum = 4999999950000000, t = 0.704821
sum = 4999999950000000, t = 1.502765
sum = 4999999950000000, t = 1.269791
sum = 4999999950000000, t = 1.226138

If I remove both pragma directives around the for loops, I get the following more stable execution times
sum = 4999999950000000, t = 1.734803
sum = 4999999950000000, t = 1.731349
sum = 4999999950000000, t = 1.727734
sum = 4999999950000000, t = 1.728106
sum = 4999999950000000, t = 1.720753
sum = 4999999950000000, t = 1.734948
sum = 4999999950000000, t = 1.734952
sum = 4999999950000000, t = 1.732308
sum = 4999999950000000, t = 1.723079
sum = 4999999950000000, t = 1.731947
sum = 4999999950000000, t = 1.726258

What am I missing?
Also, what is the underlying mechanism that determines how many cores are available and thereby create the appropriate number of threads to handle the "embarassungly parallel" for loop?

Thanks

-kook
kook
 
Posts: 2
Joined: Tue Oct 18, 2011 5:10 pm

Re: variation in execution times

Postby ftinetti » Thu Oct 27, 2011 4:42 am

Hi,

Compiler optimization options? I would suggest -O2...

Are there other processes running in the same computer? CPU contention would explain some change in timing... but I do not know if it can explain all of this... 1.7x seems to be the sequential runtime, so everything below that seem to be using more than one core, but not necessarily every available core.

Did you try with:
Code: Select all
#pragma omp parallel
{
  #pragma omp for
  for (i=0; i<N; i++) {
    a[i] = i;
  }

  #pragma omp for reduction(+:sum)
  for (i=0; i<N; i++) {
    sum += a[i];
  }
} // omp parallel

? or just
Code: Select all
#pragma omp parallel for reduction(+:sum)
for (i=0; i<N; i++) {
  a[i] = i;
  sum += a[i];
}


Also, maybe the waiting policy is adding extra overhead at the end of the first omp for... or I'm missing something...

HTH.
ftinetti
 
Posts: 567
Joined: Wed Feb 10, 2010 2:44 pm

Re: variation in execution times

Postby kook » Thu Oct 27, 2011 8:37 am

Adding the additional pragma line (as you suggested) didn't help with the inconsistent times.
What did help (as you also suggested) was using the -O switch. I used -O3 and the times became much more consistent as well as faster.
See below
sum = 4999999950000000, t = 0.496907
sum = 4999999950000000, t = 0.493213
sum = 4999999950000000, t = 0.491053
sum = 4999999950000000, t = 0.492295
sum = 4999999950000000, t = 0.495296
sum = 4999999950000000, t = 0.494597
sum = 4999999950000000, t = 0.491131
sum = 4999999950000000, t = 0.493650
sum = 4999999950000000, t = 0.491728
sum = 4999999950000000, t = 0.495924
sum = 4999999950000000, t = 0.491275
sum = 4999999950000000, t = 0.490204
sum = 4999999950000000, t = 0.496643
sum = 4999999950000000, t = 0.493061
sum = 4999999950000000, t = 0.493016

I do recall (in other posts) the suggestion about using the -O switch but at the time I didn't think that would make a difference.
But it does for some reason. Not sure why. But at least now I can justify doing further research on openMP to help speedup my project using multicore.

Thanks for your time and suggestions.

-kook
kook
 
Posts: 2
Joined: Tue Oct 18, 2011 5:10 pm

Re: variation in execution times

Postby ftinetti » Thu Oct 27, 2011 8:51 am

Hi,

I do recall (in other posts) the suggestion about using the -O switch but at the time I didn't think that would make a difference.
But it does for some reason. Not sure why.

My guess is that debug information (which is not included with the -O switch) could lead to "strange" overheads, as the one you have shown in the initial post. But maybe I'm losing something...
ftinetti
 
Posts: 567
Joined: Wed Feb 10, 2010 2:44 pm


Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot] and 11 guests