I run into a weird problem using OMP with nested parallelism. I'v run the following code:
#pragma omp parallel for schedule(static)
for (ii = 0; ii < 5; ++ii) // for more averaged time measurements
int i, j, k;
#pragma omp parallel for private(j, k, i) schedule(static)
for (k = 0; k < loopSize; ++k)
#pragma omp parallel for private(j, i)
for (i = 0; i < loopSize2; ++i)
// DO SOME WORK
in short, there are 3 levels of parallelism.
when I run it on a 2-core CPU (in VS2005) the program run infinitely with the CPU worling 100%. looks like some kind of deadlock!
when breaking the run with the debuger and looking at the stack, I saw that the topmost location of each of the threads was
vcompd.dll!_vcomp::PartialBarrier1_Poll::Block() + 0x3f bytes
what is the problem ?
if I use only 2 levels of parallelisation everything works o.k., is there a limit to the number of nested omp directives ?