[Omp] slow performance
andrew wang
mcwang88 at hotmail.com
Wed Dec 15 18:19:05 PST 2004
Hi All,
Sorry, forget to tell you the system info:
Compaq AlphaServer SC45 with 44 nodes, each node comprising of four 1GHz
Alpha processors with 1GB memory. I am uing only one node with different
thread number (1-3). Compaq C compiler supports openmp spe 1.0. The os
should be true 64 Unix.
I also try to compile same program on Intel C compiler 8.0, and run it on
two processor win2k server. Here is the running result:
D:\omp\test>try 2
omp_get_num_procs=2
Parallel region time=12 seconds
Total time = 14 seconds
D:\omp\test>try 1
omp_get_num_procs=2
Parallel region time=12 seconds
Total time = 14 seconds
seems there is not much difference, same problem.
As somebody point out, my program actually do not much inside parallel
region, so i increase the inner loop from 50->500,
...
for (kk=0; kk< 500; kk ++){
x = (kk+0.5)*step;
sum += 4.0/(1.0+x*x); // more complicated calculation here.
}
...
here is the result:
d:\omp\test>try 1
omp_get_num_procs=2
Parallel region time=83 seconds
Total time = 87 seconds
D:\omp\test>try 2
omp_get_num_procs=2
Parallel region time=66 seconds
Total time = 66 seconds
So the perfromance got enhanced for 2 threads. If this is the case, how
should I parallelize such program? Because in my real program, I can only
parallize the particular region only.
Thanks
Andrew
>From: Nils Smeds <smeds at pdc.kth.se>
>Reply-To: smeds at pdc.kth.se
>To: "andrew wang" <mcwang88 at hotmail.com>
>CC: omp at openmp.org
>Subject: Re: [Omp] slow performance Date: Wed, 15 Dec 2004 17:21:50 +0100
>
>
>mcwang88 at hotmail.com said:
> > But to my big suprise, I see that the result is quite different from
>what I
> > can imagine. The more threads I have, the more slow the calculation is.
>
>You need to tell us more about the platform you are running on. How many
>processors
>are available? How many processors are in use? Is there any other processes
>running
>that may interfere with your application? What kind of processors?
>Operating system?
>
>You enter and exit a parallel region 16200*50 times. The 39 second overhead
>then
>divides into 39s/(16200*50) = 48µs per fork-join which sounds a little high
>on a
>modern system, but it is not outrageously high.
>
>/Nils
>
More information about the Omp
mailing list