I am using OpenMP to parallelize the following loop (in pseudo-code) in an Intel Core2, under Linux OS.
// Initialization of dense matrix and vectors
for (unsigned int i=0; i<NumLoops; i++)
{
#pragma omp parallel sections default(shared)
{
#pragma omp section
// Product v1out = A*v1 with BLAS routines.
#pragma omp section
// Product v2out = A*v2 with BLAS routines.
}
}
As expected, I'm getting speedups of about 1.9 while the size of A matrix is under 1000 variables. However, a drop of performance appears as the matrix size increases, decreasing down to 1.5 when the size of the matrix is 5000.
Has anybody experienced similar drops of performance with large matrices? What could be the reason for such behavior?
Thanks in advance.
Fran González
