Help with inconsistent results

General OpenMP discussion

Help with inconsistent results

Postby ikingrma » Sun Nov 24, 2013 2:30 pm

I am running a large INTEL FORTRAN code that has Parallel OPENMP operations and also uses the parallel version of the PARDISO equation solver. They operate at different parts of the code. OPENMP is used during equation building. I have been running the model on multiple computers this way for more than 2 years with no problems. Since OPENMP was a retrofit of an older version, I was very careful to ensure that local variables were consistently treated and tested using a debugger.

Now I have a problem. When the model is run on two Toshiba laptops and Dell Latitude (all multicore I7) the model apparently randomly gets erroneous solutions to problem. All computers are running Windows 7. Simulations involve multiple steps of these solutions and the code is stopping at totally different points. The model does OK (no failures despite multiple runs) on my Dell Inspiron laptop. When OPENMP is disabled during compile the code runs with no problems. Similarly when I disable parallel PARDISO the model performs OK. so everything points some inconsistency between the two parallelizations. One point is that at no time do I set limit, call for information, check the number of threads.

I am out of ideas for diagnosing the problem, has anybody experienced this issue or can anybody suggest a way to work out what is going wrong. Do I need to do something with the threads?

Thanks in advance
ikingrma
 
Posts: 7
Joined: Wed Sep 28, 2011 8:37 pm

Re: Help with inconsistent results

Postby MarkB » Mon Nov 25, 2013 2:48 pm

ikingrma wrote:One point is that at no time do I set limit, call for information, check the number of threads.


I'm not sure exactly what you mean here, but the PARDISO manual suggests that you must set the OMP_NUM_THREADS environment variable and pass this value to the library in IPARM(3). Is that what you are doing?
MarkB
 
Posts: 456
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: Help with inconsistent results

Postby ikingrma » Mon Nov 25, 2013 3:59 pm

Thanks for the thought Mark, but he Intel MKL documentation now specifically says RESEVED set-0 for IPARM(3). I originally set this number to the number of processors available but removed it. That is part of my uncertainty. I should stress that model does not come to any sort of error ending except that the solution fails to converge (the code does multiple time steps and iterates within them) after upwards of 1000 steps and that number varies. When I take out OPENMP from the compilation, it goes the full 2880 steps reliably.
ikingrma
 
Posts: 7
Joined: Wed Sep 28, 2011 8:37 pm

Re: Help with inconsistent results

Postby MarkB » Tue Nov 26, 2013 2:21 am

Ah. sorry, I see that the MKL version and the independent release of PARDISO are now incompatible. I'm not sure what to suggest: could it be a memory/stack size problem? Are you able to test with a small example?
MarkB
 
Posts: 456
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: Help with inconsistent results

Postby ikingrma » Wed Nov 27, 2013 3:34 am

When smaller cases are used there is no problem. It appears that when larger problems are solved the run time is longer and eventually some kind of event occurs that causes a thread issue, but only for certain computers. We are talking in terms 3000 to 500o solutions of 50,000 equations. The question then becomes is there anything in the operating system settings that would differ from computer to computer and would that influence OPENMP operations. We have compared virtual memory setting and they are similar and the models are using default heap and stack. Would they vary from machine to machine and if so would they cause a memory error without stopping the model?
ikingrma
 
Posts: 7
Joined: Wed Sep 28, 2011 8:37 pm

Re: Help with inconsistent results

Postby MarkB » Wed Nov 27, 2013 3:57 am

Default stack size could indeed vary from machine to machine, and stack overflows can corrupt memory without throwing errors.
MarkB
 
Posts: 456
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: Help with inconsistent results

Postby ikingrma » Thu Nov 28, 2013 3:24 pm

Mark,

Thanks for your help, I may have resolved the issue, I switched to using the /heap-array option in Intel FORTRAN and the code has successfully completed twice on one of laptops that failed. We will run more tests on the other computers that failed to confirm.
ikingrma
 
Posts: 7
Joined: Wed Sep 28, 2011 8:37 pm

Re: Help with inconsistent results

Postby MarkB » Fri Nov 29, 2013 5:18 am

Great, hope that proves to be the solution!
MarkB
 
Posts: 456
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh


Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 13 guests