Overhead cost

General OpenMP discussion

Re: Overhead cost

Postby ftinetti » Fri Nov 30, 2012 8:56 am

Hi DD,

I have Windows 7 64 bit, and I run from Visual Studio 2010 Professional.

Hmmm... I've not used VS, but I think you could open a command prompt window and

set OMP_WAIT_POLICY=TRUE
<run your .exe>

Also, why would 1 core (which goes through the OpenMP loop once per trip) run faster than 0 cores, which does the same thing but not inside an OMP construct?

Ups! I didn't realize about times for 0 and 1 values of CORE... I can't see the relationship among values 0 and 1 with the number of cores used at runtime...

Fernando.
ftinetti
 
Posts: 575
Joined: Wed Feb 10, 2010 2:44 pm

Re: Overhead cost

Postby Oldboy » Wed Dec 05, 2012 3:30 am

This is one example of the overhead cost. The code is from John Burkardt's Homepage (prime_openmp and prime_serial). This is Fortran 90 (also available in C and CPP).
AMD FX-8120 and Linux 32bit from 2011. Gfortran -fopenmp -O3

PRIME_NUMBER_OPENMP
FORTRAN90/OpenMP version

Number of processors available = 8
Number of threads = 8

PRIME_NUMBER_SWEEP_OPENMP
Call PRIME_NUMBER to count the primes from 1 to N.

N Pi Time

1 0 0.595245E-03
2 1 0.219340E-04
4 2 0.176070E-04
8 4 0.144850E-04
16 6 0.157930E-04
32 11 0.164540E-04
64 18 0.171540E-04
128 31 0.274600E-04
256 54 0.535070E-04
512 97 0.156159E-03
1024 172 0.556322E-03
2048 309 0.302083E-02
4096 564 0.740395E-02
8192 1028 0.127263E-01
16384 1900 0.332342E-01
32768 3512 0.115615
65536 6542 0.438407
131072 12251 1.72663

PRIME_NUMBER_SWEEP_OPENMP
Call PRIME_NUMBER to count the primes from 1 to N.

N Pi Time

5 3 0.801100E-05
50 15 0.901800E-05
500 95 0.786350E-04
5000 669 0.573093E-02
50000 5133 0.266357
500000 41538 22.7875

PRIME_NUMBER_OPENMP
Normal end of execution.
..........................................
PRIME_NUMBER_SERIAL
FORTRAN90


PRIME_NUMBER_SWEEP
Call PRIME_NUMBER to count the primes from 1 to N.

N Pi Time

1 0 0.00000
2 1 0.00000
4 2 0.00000
8 4 0.00000
16 6 0.00000
32 11 0.100000E-02
64 18 0.00000
128 31 0.00000
256 54 0.00000
512 97 0.00000
1024 172 0.199900E-02
2048 309 0.200000E-02
4096 564 0.899900E-02
8192 1028 0.309950E-01
16384 1900 0.117982
32768 3512 0.448932
65536 6542 1.73574
131072 12251 6.76697

PRIME_NUMBER_SWEEP
Call PRIME_NUMBER to count the primes from 1 to N.

N Pi Time

5 3 0.00000
50 15 0.00000
500 95 0.00000
5000 669 0.129980E-01
50000 5133 1.02984
500000 41538 94.2797

PRIME_NUMBER_SERIAL
Normal end of execution.
Oldboy
 
Posts: 17
Joined: Wed Oct 31, 2012 2:39 am

Re: Overhead cost

Postby dondilworth » Sun Dec 16, 2012 11:47 am

This has been very instructive! I was especially interested in the fact that the multicore routines, even with only a single core activated, run much faster than the single-thread routines. Those do the same thing, but in that case the input is in an argument list of 12 items. The MP version has everything in indexed arrays in a common block. Thinking that reading those arguments in and out was also using overhead, I reprogrammed it so both modes transfer via common blocks rather than an argument list. Presto! Now the single-thread and single core routines take about the same time. Whew! I wonder how many programmers know about that overhead.
dondilworth
 
Posts: 8
Joined: Sat Jun 02, 2012 5:52 am

Re: Overhead cost

Postby ftinetti » Mon Dec 17, 2012 4:04 am

Hi,

Whew! I wonder how many programmers know about that overhead.

Do you have a complete example with this behavior? I would like to run some tests in my computers, if possible.

Fernando.
ftinetti
 
Posts: 575
Joined: Wed Feb 10, 2010 2:44 pm

Re: Overhead cost

Postby dondilworth » Tue Dec 18, 2012 5:50 am

Hmmm... What, exactly is a "complete example"? If you mean can I send you the source code for my commercial software, of course I must decline. A simpler example should not be a problem, and you can create one as easily as I. Write a Fortran program that calls a subroutine about 1000 times. Put 12 arguments in a call list, all double-precision floating-point variables, and time the execution.

Then delete the argument list and instead load the variables into a named common block, and run the test again. Let me know what you find, please.

BTW: I tried your suggestion

set OMP_WAIT_POLICY=TRUE
<run your .exe>


and the program ran just slightly slower than before. Thank you for the suggestion anyway.

DD
dondilworth
 
Posts: 8
Joined: Sat Jun 02, 2012 5:52 am

Re: Overhead cost

Postby ftinetti » Tue Dec 18, 2012 9:14 am

Hi,

Hmmm... What, exactly is a "complete example"? If you mean can I send you the source code for my commercial software,

No.

A simpler example should not be a problem, and you can create one as easily as I. Write a Fortran program that calls a subroutine about 1000 times. Put 12 arguments in a call list, all double-precision floating-point variables, and time the execution.

Then delete the argument list and instead load the variables into a named common block, and run the test again. Let me know what you find, please.


Well... if this as easy as you explain, you should already know what I should find... or I missed something...

Fernando.
ftinetti
 
Posts: 575
Joined: Wed Feb 10, 2010 2:44 pm

Re: Overhead cost

Postby dondilworth » Mon Dec 24, 2012 3:45 pm

I thought my suggestion made sense: you asked for an example that shows the increased overhead of the argument list. So I described how to make an example. That way you have the example you requested. What was not clear?
dondilworth
 
Posts: 8
Joined: Sat Jun 02, 2012 5:52 am

Previous

Return to Using OpenMP

Who is online

Users browsing this forum: Yahoo [Bot] and 7 guests