OpenMP makes programs the same speed

General OpenMP discussion

OpenMP makes programs the same speed

Postby Nevsky1 » Sat Jun 21, 2008 9:52 am

This is my first experience with OpenMP, as usual with first experiences in computer science it is not going well. I have the numerical integration code
#include "stdafx.h"
#include "time.h"
#include "stdio.h"

/* you can adjust this to get reasonable running time on your computer */
#define NUM_STEPS 70000000

int main(int argc, char *argv[])
{
double start, stop; /* for keeping track of running time */
double sum;
double x;
int i;

/* time starts now */
start = clock();

sum = 0;
for (i = 0; i<NUM_STEPS; i++) {
x = 2.0 * (double)i / (double)(NUM_STEPS); /* value of x */
sum += x * x / NUM_STEPS;
}

/* we're done so stop the timer */
stop = clock();

printf("found result %f in %.3f seconds\n", sum, (stop-start)/1000);

return 0;
}

Then in with OpenMP



#include "time.h"
#include "stdio.h"
#include "omp.h"
/* you can adjust this to get reasonable running time on your computer */
#define NUM_STEPS 70000000

int main(int argc, char *argv[])
{
double start, stop; /* for keeping track of running time */
double sum;
double x;
int i;

/* time starts now */
start = clock();

sum = 0;
#pragma omp parallel for private(x) reduction(+:sum)
for (i = 0; i<NUM_STEPS; i++) {
x = 2.0 * (double)i / (double)(NUM_STEPS); /* value of x */
sum += x * x / NUM_STEPS;
}

/* we're done so stop the timer */
stop = clock();

printf("found result %f in %.3f seconds\n", sum, (stop-start)/1000);

return 0;
}

The time taken is roughly the same, sometimes the OpenMP version is slightly slower. What is going on? I thought this method of reduction was meant to speed things up.
Nevsky1
 

Re: OpenMP makes programs the same speed

Postby ejd » Sat Jun 21, 2008 1:28 pm

First, you have to look at how long your program is taking to run. clock returns the amount of CPU time in microseconds. On my slow system your sequential version is taking only about 2.6 seconds to execute. Comparing run times that take this small amount of time is hard, so I increased NUM_STEPS by a factor of 10. The following is based on this increase.

Code: Select all
% cc -xO3 n1.c
% time a.out
found result 1.333333 in 26.510 seconds
26.0u 0.0s 0:26 98% 0+0k 0+0io 0pf+0w

% cc -xopenmp -xO3 n2.c
% setenv OMP_NUM_THREADS 1
% time a.out
found result 1.333333 in 26.510 seconds
26.0u 0.0s 0:26 98% 0+0k 0+0io 0pf+0w
% setenv OMP_NUM_THREADS 2
% time a.out
found result 1.333333 in 26.450 seconds
26.0u 0.0s 0:13 196% 0+0k 0+0io 0pf+0w
% setenv OMP_NUM_THREADS 3
% time a.out
found result 1.333333 in 26.460 seconds
26.0u 0.0s 0:08 293% 0+0k 0+0io 0pf+0w
% setenv OMP_NUM_THREADS 4
% time a.out
found result 1.333333 in 26.540 seconds
26.0u 0.0s 0:06 389% 0+0k 0+0io 0pf+0w

The first run is your sequential version and it took about 26.5 seconds. I am using Unix csh and time shows:
  • Number of seconds of CPU time devoted to the user's process.
  • Number of seconds of CPU time consumed by the kernel on behalf of the user's process.
  • Elapsed (wallclock) time for the command.
  • Total CPU time - U (user) plus S (system) - as a percentage of E (elapsed) time.
The following runs, are for OMP_NUM_THREADS 1 to 4. As you can see, the number of seconds of CPU time is about the same across the runs. This is expected since you are doing about the same amount of work for all the runs. However, if you look at the third number, elapsed wall clock time, you see that the sequential version and the 1 thread run take the same amount of time (26 seconds), but that the 2 thread run takes one-half of the time of a sequential run (13 seconds). The 3 thread run takes about 8 seconds and the 4 thread run takes about 6 seconds. So you are getting a rather nice speedup (almost linear).

So the reduction is doing quite nicely on my system (a Sun box using Sun Studio 12). I don't know what type of hardware or software you are using, but you should be able to see something similar.

I have two other comments though. First, make sure that when you compile you are specifying the same optimization levels on what you are comparing. Some compilers may change the optimzation when you specify OpenMP. Second, OpenMP reduction implementations vary a great deal between implementations. With a simplistic implementation you can see a slow down, as opposed to what a user can do in his code. With a good implementation, you shouldn't see a lot of slow down for this problem. Hope that helps.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: OpenMP makes programs the same speed

Postby Hello World » Wed Aug 13, 2008 11:15 pm

OpenMP (or multithreaded application) does not guarantee to speed up the execution. It is true it may get worse. Efficiency depends on parallelism, not to parallelize do-loops, or sections, ... Cache coherency also may kill the performance. Efficient parallel program can show elasped time is reduced, not cpu time. For example (copy from http://www.Equation.com),

CPUs elapsed time speedup efficiency
1 18.33 1.00 100.0%
2 9.29 1.97 98.7%
3 6.29 2.91 97.1%
4 4.80 3.81 95.5%

If you like to see a speedup, you need to find parallelism to parallelize it.
Hello World
 


Return to Using OpenMP

Who is online

Users browsing this forum: Yahoo [Bot] and 14 guests