[Omp] Does omp really work?
Andreas Gajda
ag899934 at relict.urz.tu-dresden.de
Thu Jan 26 02:08:12 PST 2006
Hi!
Maybe I can help.
Although the messured time (in serial) indicates that with the 10 times
increased array the time increased ten times, i dont think the compiler
optimize the field away.
Anyway, the behavior of the program is really normal for the following
reasons:
1. An OMP-Program is suspected to create threads at the first arriving at
a parallel region. After leaving the parallel region is it free to keep
the threads till the end of the program or throw it away. In this case it
keeps them. Which is useful, when you have, for instance the same parallel
region in a loop.
2. Consider the Parallizing-Overhead. The program need time to create the
other threads and copy shared variables, initialize nonshared etc.pp.
3. Dependend on the system and compiler it is possible, that a program
create threads, but execute it on the same processor. this means the
execution is serial at all (plus the overhead).
4. You have a loop that acess each readed Variable twice and each wroted
variable once. This ends up in so much cache-misses that there is no
performance at all and you'll only messure the speed of the cacheline.
This has nothing to do with OpenMP, but in such cases you can't expect any
Speedup.
5. If possible, please dont use any Window System for parallel scheduled
programs. Nobody really knows how Windows schedule the threads, I guess.
Try it on other Systems first.
At the end this means: make shure you execute your program really on
both cores at the same time!
(It's no use to execute them on both cores, but one after the other, it's
still serial then.)
best wishes
Andreas Gajda
On Wed, 25 Jan 2006, Eugene Loh wrote:
> Mircea Tonceanu wrote:
>
> I am dealing with simulations for n-body systems and I was very happy learning about omp. Anyway, after trying to implement it
> in my programs I was forced to test it in the simpliest program possible.
>
> Test it in the simplest program first. Then use it in your programs.
>
> Anyhow, it appears that you don't do anything with the results of the
> timed operation. Maybe the compiler can optimize the operation away
> in certain cases. And, maybe the time of the operation when it is performed
> is dominated by the cost of paging in memory.
>
> For timing experiments:
>
> 1) Make sure results of operations are used in sufficiently nontrivial
> ways that the compiler cannot optimize the operations away.
>
> 2) Repeat the timing loop multiple times to make sure you're getting
> meaningful timings. In particular, do not just time a simple loop once.
>
> Those are just guesses... I didn't actually play with the code.
>
> Then I found out that something went wrong. I mean timing. Here you can see an extra simplified program I am concerned about:
>
> #include <conio.h>
> #include <stdio.h>
> #include <windows.h>
> #include <omp.h>
>
>
> const long n = 5000000; // 5 million particles
> long arr[n];
> long arrb[n];
> DWORD time_init;
> DWORD time_final;
> DWORD time_elapsed;
>
> int main()
> {
> time_init = GetTickCount();
>
> #pragma omp parallel for
> for ( long k=1; k<n-1; k++)
> { arrb[k] = arr[k-1] + arr[k+1];
> }
>
> time_final = GetTickCount();
> time_elapsed = time_final - time_init;
> printf("Intial time:\t %u \n", time_init);
> printf("Final time:\t %u \n", time_final);
> printf("Elapsed time:\t %u \n", time_elapsed);
>
> _getch();
> return 0;
> }
>
>
> Conditions:
> - AMD x64 dual core 3800+;
> - 1 GB RAM;
> - Win XP Pro X64;
> - the program was compiled for 32 bits;
> - with pragma enabled, task manager shows 2 threads and without, 1 thread;
> - Visual Studio 2005 's compiler
>
> The results are:
>
> 1. With pragma directive enabled:
>
> - n = 5 million -> time_elapsed = 47 (meaning 47* 15 milisec which is the resolution of my timer);
> - n = 50 million -> time_elapsed = 344;
> - n< 5 million -> time_elapsed = 16;
>
>
> 2. With pragma directive disabled:
>
> - n = 5 million -> time_elapsed = 31 ;
> - n = 50 million -> time_elapsed = 344;
> - n< 5 million -> time_elapsed = 0 (less then 15 ms).
>
>
> In all other cases, time_elapsed is allways greater in case of #pragma omp enabled, which is not good.
>
>
>
>
More information about the Omp
mailing list