more threads - longer execution time (Fibonacci example)

General OpenMP discussion

more threads - longer execution time (Fibonacci example)

Postby sjaranowski » Fri Apr 12, 2013 2:54 am

I'm running well-known Fibonacci number calculation example (included in Spec. 3.1); the code:
Code: Select all
#include <stdio.h>
#include <omp.h>

int fib (int n)
{
  int fnm1, fnm2;
   if (n < 2)
      {
         return n;
      }
   /*
  if (n < 20)
      {
         return (fib (n - 1) + fib (n - 2));
      }
   */
  else
    {
#pragma omp task shared(fnm1)
         fnm1 = fib (n - 1);

#pragma omp task shared(fnm2)
         fnm2 = fib (n - 2);

#pragma omp taskwait
         return (fnm1 + fnm2);
    }
}

int main (void)
{
  int n = 35;

#pragma omp parallel shared(n)
  {
#pragma omp single nowait
    printf ("fib (%d) = %d\n", n, fib (n));
  }

   return (0);
}

When I'm doing task cutoff (as described in "Fibonacci with tasks" topic - see commented block in code above), so tasks don't became very small (and amount of them too large) the whole program does scale more or less as described in presentation "Overview of OpenMP" by Ruud van der Pas @ Tutorial IWOMP 2010.

Now I try to run it without the cutoff - just to see how it doesn't scale. The results are somehow wierd:
Code: Select all
spj@nb-spj ~/omp/par $ gcc-4.7.2 -fopenmp fib-par2.c
spj@nb-spj ~/omp/par $ time env OMP_NUM_THREADS=1 ./a.out
fib (35) = 9227465

real    0m4.327s
user    0m4.308s
sys     0m0.003s
spj@nb-spj ~/omp/par $ time env OMP_NUM_THREADS=2 ./a.out
fib (35) = 9227465

real    0m10.426s
user    0m19.645s
sys     0m0.003s
spj@nb-spj ~/omp/par $ time env OMP_NUM_THREADS=4 ./a.out
fib (35) = 9227465

real    0m12.528s
user    0m32.342s
sys     0m0.009s

So the more threads - the longer time of execution. Should four-threaded execution work at least as fast as single-threaded? The amount of work (amount of tasks) is the same and more threads are generating/executing tasks. I'm confused with this example. Can anyone try to explain me what's going in? Is it some kind of implicit synchronization?

PS.
I've also tried untied clauses for task directives, as well as older GCC 4.6.3 and XL commercial compilers. Looks like it is not compiler issue.
sjaranowski
 
Posts: 1
Joined: Thu Apr 11, 2013 9:30 am

Re: more threads - longer execution time (Fibonacci example)

Postby MarkB » Fri Apr 12, 2013 6:47 am

Can anyone try to explain me what's going in? Is it some kind of implicit synchronization?


Almost certainly, yes!

When a task is created it will be inserted into some shared data structure (a queue, or tree of queues, etc.) in the runtime, from which other threads can extract the task and execute it. This requires some form of synchronisation, and, when there are large numbers of tiny tasks, the contention for the lock(s) protecting this data structure can become a big overhead, far exceeding the time taken to actually execute the tasks. On a single thread, there is no contention, and tasks could be executed immediately without queuing them up, though there may still be some overhead for creating the data environment for each task.

Hope that helps,
Mark.
MarkB
 
Posts: 450
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh


Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 5 guests