Can someone please tell me what am I doing wrong with the following?
Please take a look at the following snippet I'm using in my code. Single -threaded version runs fine, but I want to distribute the 16 functions calls to 4 cores, each executing 4 calls of the function, in parallel of course. After doing this, it even take longer to execute than before using any pragmas in the single threaded. I don't want to distribute iteration, I've done that before, but I want to distribute the computeBlac76() function calls to different threads. I have to do 16 calculation in one loop, so it would be optimal to have 4 threads running 4 calculations each.
This is the version that is not working (I found a 2 threaded version of QuickSort implemented in the very same way):
- Code: Select all
for(int i=0; i < numPasses; i++)
{
#pragma omp parallel sections num_threads(4)
{
#pragma omp section
computeBlack76('C', 318, 72, 0.676712328767123, 0.05, 0.7);
#pragma omp section
computeBlack76('C', 317, 208, 0.446575342465753, 0.05, 0.7);
#pragma omp section
computeBlack76('C', 78, 125, 0.972602739726027, 0.05, 0.7);
#pragma omp section
computeBlack76('C', 276, 398, 0.975342465753425, 0.05, 0.7);
#pragma omp section
computeBlack76('C', 312, 81, 0.517808219178082, 0.05, 0.7);
#pragma omp section
computeBlack76('C', 165, 390, 0.167123287671233, 0.05, 0.7);
#pragma omp section
computeBlack76('C', 93, 337, 0.254794520547945, 0.05, 0.7);
#pragma omp section
computeBlack76('C', 307, 256, 0.986301369863014, 0.05, 0.7);
#pragma omp section
computeBlack76('C', 286, 168, 0.619178082191781, 0.05, 0.7);
#pragma omp section
computeBlack76('C', 434, 92, 0.542465753424658, 0.05, 0.7);
#pragma omp section
computeBlack76('C', 361, 199, 0.994520547945206, 0.05, 0.7);
#pragma omp section
computeBlack76('C', 233, 393, 0.268493150684932, 0.05, 0.7);
#pragma omp section
computeBlack76('C', 415, 103, 0.408219178082192, 0.05, 0.7);
#pragma omp section
computeBlack76('C', 271, 175, 0.550684931506849, 0.05, 0.7);
#pragma omp section
computeBlack76('C', 353, 370, 0.73972602739726, 0.05, 0.7);
#pragma omp section
computeBlack76('C', 163, 449, 0.495890410958904, 0.05, 0.7);
}
}
It seems to me that my CPU is executing the very same 16 calculations 4 times, on the four different threads.
Is there a way to fix this?
I have been working on this all day long, and have not found any explanation to this.
I'd appreciate any kind of help or suggestion.
