Unstable OpenMP code on one computer only

General OpenMP discussion

Unstable OpenMP code on one computer only

Postby johnanairn » Wed Apr 17, 2013 1:10 pm

I am converting large computational mechanics code to use OpenMP. I am pretty confident the parts that are done are correct (mostly C++ loops with careful code revisions to insure each step in the loop is independent). I have developed or tested on four different machines:

1. Mac Desktop, MacOS 10.6.x, 2 X 3 GHz Dual-Core Intel Xeon
2. Mac Laptop (a colleague), Mac OS 10.8.x, 2 X Quad-Core Intel i7 (I think)
3. Dell PowerEdge T610 (Linux) with 2 X Xeon 5600 six-core processors
4. MacBook Pro Laptop, MacOS 10.7.5, 1 X 2.8 GHz Intel Core 2 Duo

All Mac codes are compiled with XCode 4.x using GCC 4.2, OpenMP 2.5
Linux code compiled using GCC 4.3, OpenMP 2.5

When I run on #1 through #3, the code always works and can run using any number of available cores on each machine (1 to 4 on #1, 1 to 8 and #2, and 1 to 12 on #3). But I like to work at home too. When I run on machines #4 with 2 cores the identical code almost always fails before it is done. This code consists of a series of identical time steps (to solve a dynamic mechanics problem) that have to be done in sequence and machine #4 just quits after some random number of code-identical steps. If it does finish, the results are identical to other machines. I think it always fails in or around an omp block, but the block that fails is also random. Furthermore none of those blocks ever fail on the other three machines.

Could this be a code bug that can be fixed (and how?) or is the Laptop a defective computer/OS/compiler/chip system?
johnanairn
 
Posts: 6
Joined: Tue Apr 16, 2013 8:18 pm

Re: Unstable OpenMP code on one computer only

Postby ftinetti » Thu Apr 18, 2013 4:29 am

Hi,

machine #4 just quits after some random number of code-identical steps.

Is there any error message/s? Are you compiling/linking separately on each computer? Did you try to copy everything (data, executable, etc.) from #1 to #4 and see what happens?

HTH,

Fernando.
ftinetti
 
Posts: 581
Joined: Wed Feb 10, 2010 2:44 pm

Re: Unstable OpenMP code on one computer only

Postby MarkB » Tue Apr 23, 2013 3:27 am

It is possible you might be running out of memory, or running out of stack space.
MarkB
 
Posts: 434
Joined: Thu Jan 08, 2009 10:12 am

Re: Unstable OpenMP code on one computer only

Postby johnanairn » Wed Apr 24, 2013 9:57 pm

The source code and input data are identical and it should not be running out of memory (although the computer where it fails has only 4GB memory while the others have much more) The problems I am running do not need much memory, although perhaps fragmentation could occur.

Perhaps stack space is a problem. I do not get any error messages that I can see. I probably had some "throws" in threads (which should not be done), but I now catch them all and still see no errors. I will look some more. Is there a good way to trap stack space errors?
johnanairn
 
Posts: 6
Joined: Tue Apr 16, 2013 8:18 pm

Re: Unstable OpenMP code on one computer only

Postby johnanairn » Wed Apr 24, 2013 9:58 pm

I have not tried copying executable to #4. I will try that too.
johnanairn
 
Posts: 6
Joined: Tue Apr 16, 2013 8:18 pm

Re: Unstable OpenMP code on one computer only

Postby ftinetti » Thu Apr 25, 2013 2:21 am

Hi,

I do not get any error messages that I can see.


Please use every compiler option for runtime check/s.

Is there a good way to trap stack space errors?

-fstack-check
There are many more options for runtime checkings, e.g. -fbounds-check (array bounds checking).

Please send the current compile/link command (gcc compiler/linker options, actually).

Fernando.
ftinetti
 
Posts: 581
Joined: Wed Feb 10, 2010 2:44 pm

Re: Unstable OpenMP code on one computer only

Postby MarkB » Thu Apr 25, 2013 5:52 am

You could try using ulimit -s to see if the available stack space on #4 is smaller than on the other systems, and if so, increase it.
MarkB
 
Posts: 434
Joined: Thu Jan 08, 2009 10:12 am

Re: Unstable OpenMP code on one computer only

Postby johnanairn » Thu Apr 25, 2013 9:36 am

I found error messages in the Mac crash log (given below). The main thread is crashing in omp code in _psynch_cvwait. The line GridForcesTask() on is my code. It seems to be random as to which task in my code triggers the crash.

Other checks:

1. ulimit -s is 8192 on all computers being used

2. The compile/link command is handled by XCode and rather complicated. I included them below anyway.

Thanks,
John Nairn

CRASH LOG

Date/Time: 2013-04-25 08:41:07.585 -0700
OS Version: Mac OS X 10.7.5 (11G63)
Report Version: 9

Crashed Thread: 0 Dispatch queue: com.apple.main-thread

Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000

Application Specific Information:
objc[51159]: garbage collection is OFF

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_kernel.dylib 0x00007fff91990bca __psynch_cvwait + 10
1 libsystem_c.dylib 0x00007fff8b603274 _pthread_cond_wait + 840
2 NairnMPM 0x0000000107d4570b gomp_sem_wait + 59
3 NairnMPM 0x0000000107d457c9 gomp_barrier_wait_end + 73
4 NairnMPM 0x0000000107d4550c gomp_team_end + 44
5 NairnMPM 0x0000000107dab618 GridForcesTask::Execute() + 102 (GridForcesTask.cpp:111)
6 NairnMPM 0x0000000107d46fec NairnMPM::MPMStep() + 112 (NairnMPM.cpp:758)
7 NairnMPM 0x0000000107d47392 NairnMPM::MPMAnalysis(bool) + 840 (NairnMPM.cpp:233)
8 NairnMPM 0x0000000107d45bb3 main + 626 (main.cpp:124)
9 NairnMPM 0x0000000107d44534 start + 52


COMPILE COMMAND

CompileC /Users/nairnj/Library/Developer/Xcode/DerivedData/NairnMPM-hfdzeidinbugrvarbletulxfusdv/Build/Intermediates/NairnMPM.build/Default/NairnMPM.build/Objects-normal/x86_64/main.o ../System/main.cpp normal x86_64 c++ com.apple.compilers.llvmgcc42
cd /Users/nairnj/Programming/Cocoa_Projects/nairn-mpm-fea-OpenMP/Common/Projects
setenv LANG en_US.US-ASCII
/Applications/Xcode.app/Contents/Developer/usr/bin/llvm-gcc-4.2 -x c++ -arch x86_64 -fmessage-length=0 -pipe -fopenmp -Wno-trigraphs -fpascal-strings -Os -fasm-blocks -gdwarf-2 -fvisibility=hidden -fvisibility-inlines-hidden -I/Users/nairnj/Library/Developer/Xcode/DerivedData/NairnMPM-hfdzeidinbugrvarbletulxfusdv/Build/Intermediates/NairnMPM.build/Default/NairnMPM.build/NairnMPM.hmap -I/Users/nairnj/Library/Developer/Xcode/DerivedData/NairnMPM-hfdzeidinbugrvarbletulxfusdv/Build/Products/Default/include -I../../NairnMPM/src -I.. -I/usr/local/include -I/usr/llvm-gcc-4.2/lib/gcc/i686-apple-darwin11/4.2.1/include -I/Users/nairnj/Library/Developer/Xcode/DerivedData/NairnMPM-hfdzeidinbugrvarbletulxfusdv/Build/Intermediates/NairnMPM.build/Default/NairnMPM.build/DerivedSources/x86_64 -I/Users/nairnj/Library/Developer/Xcode/DerivedData/NairnMPM-hfdzeidinbugrvarbletulxfusdv/Build/Intermediates/NairnMPM.build/Default/NairnMPM.build/DerivedSources -Wmost -F/Users/nairnj/Library/Developer/Xcode/DerivedData/NairnMPM-hfdzeidinbugrvarbletulxfusdv/Build/Products/Default -include /Users/nairnj/Programming/Cocoa_Projects/nairn-mpm-fea-OpenMP/Common/Projects/../../NairnMPM/src/System/MPMPrefix.hpp -c /Users/nairnj/Programming/Cocoa_Projects/nairn-mpm-fea-OpenMP/Common/Projects/../System/main.cpp -o /Users/nairnj/Library/Developer/Xcode/DerivedData/NairnMPM-hfdzeidinbugrvarbletulxfusdv/Build/Intermediates/NairnMPM.build/Default/NairnMPM.build/Objects-normal/x86_64/main.o


LINK COMMAND

Ld /Users/nairnj/Library/Developer/Xcode/DerivedData/NairnMPM-hfdzeidinbugrvarbletulxfusdv/Build/Products/Default/NairnMPM normal x86_64
cd /Users/nairnj/Programming/Cocoa_Projects/nairn-mpm-fea-OpenMP/Common/Projects
/Applications/Xcode.app/Contents/Developer/usr/bin/llvm-g++-4.2 -arch x86_64 -L/Users/nairnj/Library/Developer/Xcode/DerivedData/NairnMPM-hfdzeidinbugrvarbletulxfusdv/Build/Products/Default -F/Users/nairnj/Library/Developer/Xcode/DerivedData/NairnMPM-hfdzeidinbugrvarbletulxfusdv/Build/Products/Default -filelist /Users/nairnj/Library/Developer/Xcode/DerivedData/NairnMPM-hfdzeidinbugrvarbletulxfusdv/Build/Intermediates/NairnMPM.build/Default/NairnMPM.build/Objects-normal/x86_64/NairnMPM.LinkFileList -Xlinker -rpath -Xlinker "@loader_path/lib" -fopenmp -lxerces-c-3.1 -o /Users/nairnj/Library/Developer/Xcode/DerivedData/NairnMPM-hfdzeidinbugrvarbletulxfusdv/Build/Products/Default/NairnMPM
johnanairn
 
Posts: 6
Joined: Tue Apr 16, 2013 8:18 pm

Re: Unstable OpenMP code on one computer only

Postby MarkB » Thu Apr 25, 2013 10:04 am

Looks like a compiler/OS bug to me, possibly the same as reported here: https://discussions.apple.com/thread/3786045
It maybe only affects MacOS 10.7.x

gcc 4.2 is ancient history as far as OpenMP implementations go: have you considered abandoning Xcode in favour of a more recent version of gcc?
Last edited by MarkB on Mon Apr 29, 2013 3:31 am, edited 1 time in total.
MarkB
 
Posts: 434
Joined: Thu Jan 08, 2009 10:12 am

Re: Unstable OpenMP code on one computer only

Postby johnanairn » Sat Apr 27, 2013 9:28 am

Yes, that must be OS issue and that and helps me a lot. I thought I had broken a large code. And a good reason for a new laptop (for more cores) or at least time to upgrade MacOS (which seems to me to be more painful on developers than it used to be)

I use XCode because a lot of my programming is MacOS coding using objective C. I am only using OpenMP on one project (although that one is a large part of my research).
johnanairn
 
Posts: 6
Joined: Tue Apr 16, 2013 8:18 pm

Next

Return to Using OpenMP

Who is online

Users browsing this forum: Yahoo [Bot] and 11 guests