General OpenMP discussion

Hi everyone,

Did anyone try to run the fibonacci sequence computation with OpenMP tasks like stated in the IWOMP 2010 tutorial.

I've tried to run the code below but no speedup achieved and the program hangs for n = 40; The Iwomp 2010 material ( slide 124 of OpenMP overview) states that there's a approximatively x2 speedup.

Can someone help me explain this ??
Thanks
Code: Select all
`/* File  : fibonacci.c * Computes fibonacci numbers * Last modified  : 2011-04-25 */#include <stdio.h>//#define NUM_THREADS 2/* function that uses recursivity * to compute fibonacci numbers */long comp_fib_numbers(int n){  // Basic algorithm: f(n) = f(n-1) + f(n-2)  long fnm1, fnm2, fn;  if ( n == 0 || n == 1 ) return(n);#pragma omp task shared(fnm1)  fnm1 = comp_fib_numbers(n-1);#pragma omp task shared(fnm2)  fnm2 = comp_fib_numbers(n-2);#pragma omp taskwait  fn = fnm1 + fnm2;  return(fn);}int main(int argc, char **argv){  int n;  long result;  if(argc<2){    printf("usage ./fibonacci <number>\n");    exit(1);  }  n = atoi(argv[1]);  //#pragma omp parallel num_threads(NUM_THREADS)#pragma omp parallel  {#pragma omp single nowait    {      result = comp_fib_numbers(n);    } // end of single region  } // end of parallel region  printf("finonacci(%d) = %ld\n", n, result);  return 0;}`
hsi544

Hi,

I'm sorry it has taken so long to get back to you on this.

Your code is technically correct, but to get good performance you need to use a cutoff value for "n". Otherwise, too many small tasks will be generated. Once the value of "n" gets below this threshold it is best to simply execute the serial version without tasking.

I modified your code to implement this:

Code: Select all
`  if ( n == 0 || n == 1 ) return(n);  // In case the sequence gets too short, execute the serial version  if ( n < 20 )  {     return(comp_fib_numbers(n-1)+comp_fib_numbers(n-2));  }  else  {     #pragma omp task shared(fnm1)       fnm1 = comp_fib_numbers(n-1);     #pragma omp task shared(fnm2)       fnm2 = comp_fib_numbers(n-2);     #pragma omp taskwait       fn = fnm1 + fnm2;       return(fn);   }`

I compiled and ran this code on my MacBook Pro. I used VirtualBox with a virtual machine running Solaris 10. I used our Oracle Studio compilers. This is the result I get:

Code: Select all
`\$ cc -fast -xopenmp fib.c\$ export OMP_NUM_THREADS=1\$ ptime ./a.out 40finonacci(40) = 102334155real        2.790user        2.767sys         0.005\$ export OMP_NUM_THREADS=2\$ ptime ./a.out 40finonacci(40) = 102334155real        1.459user        2.872sys         0.005\$`

It seems that this fix makes the code scale quite well. By the way, this fix can also be found in the same tutorial. The relevant code fragment is shown on slide 123.

Note that the cutoff value depends on what system and OpenMP run time system you use. You may want to experiment to see what works best for you. I actually didn't spend much time on this. I picked some values and found 20 to work well, but I don't exclude the optimal value on the configuration I used may be less.

I also wanted to point out that I have a more recent version of the tutorial. You can download it from the IWOMP 2011 web site:
http://www.ncsa.illinois.edu/Conference ... rials.html

Last, but not least, you mention the code hangs for n=40. I don't see this happening with our compilers and it is hard to diagnose without knowing more. A wild guess is that you may run into some call stack limitation, nesting calls too deeply.

Kind regards, Ruud
ruud

Dear Ruud,

I have also difficulties to get sustain speedup with this example,
And I do not think that the main problem is due to "too many tasks" used.

I'm using Intel compiler C++ on Windows version 12.0 on a dual-core machine.

First, when you run this example with (n=40, cutoff=20) the number of tasks created is not so high. If you restricted to so little tasks then it is really a problem.

Second, you can run the example with (n=20, cutoff=0) and you create the same number of tasks but you will not get any speedup at all.

Third, if you change the cutoff code as below you will not get the speedup you expected no matter what is value of n or the value of the cutoff:

if ( n < 20 )
{
return(comp_fib_numbers(n-1)+comp_fib_numbers(n-2));
}

replaced by:

if ( n < 20 )
{
return(Fibonacci(n));
}

where Fibonacci is :

unsigned int Fibonacci (unsigned int n)
{
int previous = -1;
int result = 1;
for (unsigned int i = 0; i <= n; ++i)
{
int const sum = result + previous;
previous = result;
result = sum;
}
return result;
}
amimar

Posts: 2
Joined: Mon Jun 30, 2008 7:17 am