## Questions on using OMP in LDL matrix factorization

General OpenMP discussion

### Questions on using OMP in LDL matrix factorization

Hello
I tried to use OpenMP to speed up my LDL factorization algorithm, but I got only +8% in speed (I have Intel Core2Duo processor and I have only about 75% ). I'm solving matrix equation [A]*{x}={f}, where matrix [A] is positive defined, banded and symmetrical (N - matrix dimension, r - band size). My code looks like:

for (int i=0;i<N;i++)
{
.......
#pragma omp parallel for
for (int j=max(0,i-r);j<i;j++)
{ ... }
.....
#pragma omp parallel for
for (int j=i+1;j<min(N,i+1+r);j++)
{ ......
for (int k=max(0,j-r);k<j;k++)
{ ..... }
} // end for j
......
} // end for i

And the main problem here is that I cannot make outer loop parallel, because each (i+1)-th iteration uses results from (i)-th iteration and furthermore r<<N (N can be from 1e6 to 1e10, and N/r can be from 10000 to 100). I suppose that in this case the "fork-join" procedures are the bottleneck, because they are executed too often.
Can anybody help me with this problem?
And is there some kind of "approach" to such problems?

### Re: Questions on using OMP in LDL matrix factorization

You may also be running into bandwidth problems. Normally you would use some kind of blocking algorithm to maximize cache reuse, which might also allow you to parallelize at a higher level. For something like this you could consider using prepacked solvers like Intel MKL or maybe ATLAS. http://www.intel.com/cd/software/products/asmo-na/eng/266858.htm has some information.
-- Larry
lfm

Posts: 135
Joined: Sun Oct 21, 2007 4:58 pm
Location: OpenMP ARB

### Re: Questions on using OMP in LDL matrix factorization

Thank you, Larry

but unfortunately I cannot use MKL or other similar package, because of the matrix size. I'm limited with using ordinary computer (like Intel Core2Duo with about 2Gb of RAM) and the dimensions of [A] N x r = 1e7 x 1e3 gives me 1e10 doubles = 76Gb, which can only be stored on HDD - so I'm forced to use my own algorythms, working with matrix blocks, loaded from HDD into RAM. (I already contacted with Intel and they said that they cannot offer me anything useful in this problem)
Could you, please, explain to me what You meant by "some kind of blocking algorithm to maximize cache reuse, which might also allow you to parallelize at a higher level"?

### Re: Questions on using OMP in LDL matrix factorization

You might find this helpful, it illustrates something like what I was thinking about:
http://developers.sun.com/solaris/articles/FAST/lu_content.html

Last bumped by Anonymous on Sat Dec 29, 2007 10:20 am.
lfm

Posts: 135
Joined: Sun Oct 21, 2007 4:58 pm
Location: OpenMP ARB

### Who is online

Users browsing this forum: No registered users and 3 guests