OpenMP on 32 cores

General OpenMP discussion

OpenMP on 32 cores

Postby mnabi » Wed Jan 09, 2013 12:49 am

Hello,

I have an AMD with 2 CPU, and each CPU includes 16 cores (32 cores in total). Task manager shaws 4 CPU, with 8 cores per each CPU.

I use Fortran with OpenMP. After measuring the execution time of one of my codes, I found the efficiency as :

4 cores (on 1 CPU) => 83% efficiency
8 cores (on 1 CPU) => 59%
16 cores( on 2 CPU) => 32%
32 cores(all cores together) => 22%

When I use more cores, the efficiency decreases. I know it is logic. But the decrease in efficiency looks to be very fast. And finally for 32 cores, I get only 22% (only 7 times faster).
And my question: Is there any way to increase the efficiency especially using 16 or 32 cores?

Thanks
mnabi
 
Posts: 2
Joined: Wed Jan 09, 2013 12:22 am

Re: OpenMP on 32 cores

Postby ftinetti » Wed Jan 09, 2013 7:49 am

Hi,

I know it is logic. But the decrease in efficiency looks to be very fast. And finally for 32 cores, I get only 22% (only 7 times faster).
And my question: Is there any way to increase the efficiency especially using 16 or 32 cores?

There are many reasons by which performance is penalized. Some guesses:
  • Small workload: when using more threads the work per thread is reduced, it can be solved adding work per thread... easy to write but hard to do.
  • Too many synchronizations: if you have many shared data on which you have to synchronize assignments... maybe you can add some local data and aggregate results at the end...
  • Memory contention: adding threads imply adding memory requirements from CPUs. Tiling could mitigate the problem, up to some point.
but nothing is for sure, of course. Can you describe a little bit more your computer?
CPU Model:
RAM:
OS:
Compiler and compiler options:

HTH,

Fernando.
ftinetti
 
Posts: 558
Joined: Wed Feb 10, 2010 2:44 pm

Re: OpenMP on 32 cores

Postby MarkB » Wed Jan 09, 2013 8:22 am

As Fernando says, there are lots of possible causes of low efficiency.

These include:

Sequential code (i.e. time spent in parts of the code which are not parallelised with OpenMP).
Load imbalance (threads have different amount of work to do, and time is dictated by the slowest)
Communication (cache misses caused by different threads accessing the same data, including false sharing)
Synchronisation (overhead of parallel regions, implicit and explicit barriers, criticals, locks, etc.)
Resource contention (threads contending for memory bandwidth, or cache space).

As a first step, I suggest you use omp_get_wtime() to time every parallel region: this will give you some idea where to start looking.
MarkB
 
Posts: 408
Joined: Thu Jan 08, 2009 10:12 am

Re: OpenMP on 32 cores

Postby Oldboy » Wed Jan 09, 2013 9:16 am

As 32 cores are not for everyone it would be very interesting if you could test some programs on this page:
http://people.sc.fsu.edu/~jburkardt/index.html
I believe the MD and MD_OMP are most interesting. Fortran is faster than C and C++ (but the speed diff is lower with 64 bit code)
You can compare MD (serial) and MD_OMP.
Standard parameters are 3, 1000 and 400 (using the same for serial and OMP).
But with 32 core it could also be interesting with something taking more time, parameters like 3, 5000 and 400 in order to see if efficiency is the same
My 64 bit result are better than 32 bit with gfortran on OMP but the seriell results are very close (FX-8120 with no overclocking)
Oldboy
 
Posts: 17
Joined: Wed Oct 31, 2012 2:39 am

Re: OpenMP on 32 cores

Postby Oldboy » Thu Jan 10, 2013 8:49 am

In fact Phenom 9550 with four cores is running 420% faster parallel than serial om 64-bit MD (molecular dynamics). This program is not small so the cache L2=512/ L3=2048 could be too small for one thread
Oldboy
 
Posts: 17
Joined: Wed Oct 31, 2012 2:39 am

Re: OpenMP on 32 cores

Postby mnabi » Fri Jan 11, 2013 12:10 am

Thank you for your help.
Actually I don't make parallel by blocks. I just use "Parallel Do" to run the loops as parallel. So, I think I have no control on the load imbalance.

My PC is:
AMD Opteron 6284 SE, G34 Socket, 2.7 GHz
Ram, 32 GB
Motherboard ASUS, KGPE-D16

The main parts of the code include do loops (99%), so the most of the code is already parrallelised.

Is there any extra command, I can use, to increase the efficiency? Or is there any specific method?
My code solver the 3D NS equations, explicit for momentum, and using multigrid for pressure correction, with sor smoother.

Thank you for your advices.
mnabi
 
Posts: 2
Joined: Wed Jan 09, 2013 12:22 am

Re: OpenMP on 32 cores

Postby ftinetti » Fri Jan 11, 2013 10:10 am

Hi again,

Actually I don't make parallel by blocks. I just use "Parallel Do" to run the loops as parallel. So, I think I have no control on the load imbalance.

Hmmm... blocking is almost always good for sequential as well as parallel performance. I would start by following Mark's suggestion. Also you may play around different scheduling/s for the DO loops. Can you describe a little bit more your code? I don't know the details of
My code solver the 3D NS equations, explicit for momentum, and using multigrid for pressure correction, with sor smoother.

And thank you so much for sending the details of your computer.

Fernando.
ftinetti
 
Posts: 558
Joined: Wed Feb 10, 2010 2:44 pm

Re: OpenMP on 32 cores

Postby Oldboy » Sat Jan 12, 2013 12:58 am

One example from SPEC:

http://www.spec.org/cpu2006/results/res ... 23741.html

Sometimes it is a good idea to spend time testing compiler directives. Some SPEC_peak directives can result in extreme performance. But the directives are not few...
Oldboy
 
Posts: 17
Joined: Wed Oct 31, 2012 2:39 am

Re: OpenMP on 32 cores

Postby kazempour » Wed Feb 20, 2013 2:18 pm

Hi there,

I also encounter somehow the same problem, you can check:
viewtopic.php?f=3&t=1527

May be there is one missing idea in our implementations. I'll inform you as soon as I can find any solution.
By the way check if you enabled OMP_NESTED or not, besides you can search for Hyper-threading as well. I didn't have so much experience on AMD processors but I think the logic is as the same as Intel processors.

Regards,
Mahdi
kazempour
 
Posts: 13
Joined: Wed Jul 25, 2012 4:11 am

Re: OpenMP on 32 cores

Postby Robert Webber » Wed Oct 23, 2013 5:07 am

I am facing the very same problem. I have a lot of loops in my code, which means there is a lot of parallelisation. Still the efficiency is minimum. I hope kazempour may post here if he found out a solution for this.

___________________________
http://microsoftwindowstechs.com/
Robert Webber
 
Posts: 1
Joined: Tue Oct 22, 2013 9:52 pm

Next

Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot] and 3 guests