[Omp] slow performance

andrew wang mcwang88 at hotmail.com
Wed Dec 15 18:19:05 PST 2004


Hi All,

Sorry, forget to tell you the system info:

Compaq AlphaServer SC45 with 44 nodes, each node comprising of four 1GHz 
Alpha processors with 1GB memory. I am uing only one node with different 
thread number (1-3). Compaq C compiler supports openmp spe 1.0. The os 
should be true 64 Unix.


I also try to compile same program on Intel C compiler 8.0, and run it on 
two processor win2k server. Here is the running result:

D:\omp\test>try 2
omp_get_num_procs=2
Parallel region time=12 seconds
Total time = 14 seconds
D:\omp\test>try 1
omp_get_num_procs=2
Parallel region time=12 seconds
Total time = 14 seconds

seems there is not much difference, same problem.


As somebody point out, my program actually do not much inside parallel 
region, so i increase the inner loop from 50->500,

...
  for (kk=0; kk< 500; kk ++){


	        x = (kk+0.5)*step;
	        sum += 4.0/(1.0+x*x);   // more complicated calculation here.
	       }
...


here is the result:


d:\omp\test>try 1
omp_get_num_procs=2
Parallel region time=83 seconds
Total time = 87 seconds

D:\omp\test>try 2
omp_get_num_procs=2
Parallel region time=66 seconds
Total time = 66 seconds

So the perfromance got enhanced for 2 threads. If this is the case, how 
should I parallelize such program? Because in my real program, I can only 
parallize the particular region only.


Thanks
Andrew

>From: Nils Smeds <smeds at pdc.kth.se>
>Reply-To: smeds at pdc.kth.se
>To: "andrew wang" <mcwang88 at hotmail.com>
>CC: omp at openmp.org
>Subject: Re: [Omp] slow performance Date: Wed, 15 Dec 2004 17:21:50 +0100
>
>
>mcwang88 at hotmail.com said:
> > But to my big suprise, I see that the result is quite different from 
>what I
> > can  imagine. The more threads I have, the more slow the calculation is.
>
>You need to tell us more about the platform you are running on. How many 
>processors
>are available? How many processors are in use? Is there any other processes 
>running
>that may interfere with your application? What kind of processors? 
>Operating system?
>
>You enter and exit a parallel region 16200*50 times. The 39 second overhead 
>then
>divides into 39s/(16200*50) = 48µs per fork-join which sounds a little high 
>on a
>modern system, but it is not outrageously high.
>
>/Nils
>






More information about the Omp mailing list