Hello

I tried to use OpenMP to speed up my LDL factorization algorithm, but I got only +8% in speed (I have Intel Core2Duo processor and I have only about 75% ). I'm solving matrix equation [A]*{x}={f}, where matrix [A] is positive defined, banded and symmetrical (N - matrix dimension, r - band size). My code looks like:

for (int i=0;i<N;i++)

{

.......

#pragma omp parallel for

for (int j=max(0,i-r);j<i;j++)

{ ... }

.....

#pragma omp parallel for

for (int j=i+1;j<min(N,i+1+r);j++)

{ ......

for (int k=max(0,j-r);k<j;k++)

{ ..... }

} // end for j

......

} // end for i

And the main problem here is that I cannot make outer loop parallel, because each (i+1)-th iteration uses results from (i)-th iteration and furthermore r<<N (N can be from 1e6 to 1e10, and N/r can be from 10000 to 100). I suppose that in this case the "fork-join" procedures are the bottleneck, because they are executed too often.

Can anybody help me with this problem?

And is there some kind of "approach" to such problems?