I am trying to parallelize my problem which takes 6 or 7 seconds in sequential mode.
Basically the sequential loop is a=6, b = 8000, c=150.
My OpenMP code looks like:
#pragam omp parallel
for( int = 0; i<a; i++)
// loop b
// loop c
call mkl library to solve a linear system 14X14 Ax=B
I measured each iteration in loop c it only takes 0.1ms to finish. so total run time with sequential code is about 0.1x6x8000x150 = 7.2 seconds. Now I changed number of thread to bigger number and it looks like there's no any gains. with 2-6 threads the run time was about 7 to 9 seconds. My computer has 16 cores, no Hyper threading. I also noticed run time of each iteration in loop c changed to bigger number from 0.1ms. I tried mkl-sequential and mkl-parallel same results. my guess is there's big latency from memory access when number of threads goes up. Does anybody have any ideas? thanks in advance.