Discussion on the OpenMP specification run by the OpenMP ARB. OpenMP and the OpenMP logo are registered trademarks of the OpenMP Architecture Review Board in the United States and other countries. All rights reserved.
I am running a large INTEL FORTRAN code that has Parallel OPENMP operations and also uses the parallel version of the PARDISO equation solver. They operate at different parts of the code. OPENMP is used during equation building. I have been running the model on multiple computers this way for more than 2 years with no problems. Since OPENMP was a retrofit of an older version, I was very careful to ensure that local variables were consistently treated and tested using a debugger.
Now I have a problem. When the model is run on two Toshiba laptops and Dell Latitude (all multicore I7) the model apparently randomly gets erroneous solutions to problem. All computers are running Windows 7. Simulations involve multiple steps of these solutions and the code is stopping at totally different points. The model does OK (no failures despite multiple runs) on my Dell Inspiron laptop. When OPENMP is disabled during compile the code runs with no problems. Similarly when I disable parallel PARDISO the model performs OK. so everything points some inconsistency between the two parallelizations. One point is that at no time do I set limit, call for information, check the number of threads.
I am out of ideas for diagnosing the problem, has anybody experienced this issue or can anybody suggest a way to work out what is going wrong. Do I need to do something with the threads?
ikingrma wrote:One point is that at no time do I set limit, call for information, check the number of threads.
I'm not sure exactly what you mean here, but the PARDISO manual suggests that you must set the OMP_NUM_THREADS environment variable and pass this value to the library in IPARM(3). Is that what you are doing?
Thanks for the thought Mark, but he Intel MKL documentation now specifically says RESEVED set-0 for IPARM(3). I originally set this number to the number of processors available but removed it. That is part of my uncertainty. I should stress that model does not come to any sort of error ending except that the solution fails to converge (the code does multiple time steps and iterates within them) after upwards of 1000 steps and that number varies. When I take out OPENMP from the compilation, it goes the full 2880 steps reliably.
Ah. sorry, I see that the MKL version and the independent release of PARDISO are now incompatible. I'm not sure what to suggest: could it be a memory/stack size problem? Are you able to test with a small example?
When smaller cases are used there is no problem. It appears that when larger problems are solved the run time is longer and eventually some kind of event occurs that causes a thread issue, but only for certain computers. We are talking in terms 3000 to 500o solutions of 50,000 equations. The question then becomes is there anything in the operating system settings that would differ from computer to computer and would that influence OPENMP operations. We have compared virtual memory setting and they are similar and the models are using default heap and stack. Would they vary from machine to machine and if so would they cause a memory error without stopping the model?
Thanks for your help, I may have resolved the issue, I switched to using the /heap-array option in Intel FORTRAN and the code has successfully completed twice on one of laptops that failed. We will run more tests on the other computers that failed to confirm.