[Omp] Newbie in deep end >> Matrix multiplier > Need to
addOpenMPdirectives
Yuan Lin
Yuan.Lin at Sun.COM
Thu Apr 7 15:44:41 PDT 2005
Mitch,
Grant is right. By default, 'i' and 'l' are shared among the threads and
you will run into a data race condition.
If you are compiling on Sun Solaris platform, you can use -xvpara option
to watch for such errors.
For example,
>cat -n t.c
1 int main()
2 {
3 int i,j;
4 int a[100][100];
5
6 #pragma omp parallel for
7 for (i=1; i<100; i++)
8 for (j=1; j<100; j++)
9 a[i][j] = 0;
10 }
>cc -xopenmp -xO3 -xvpara -c t2.c
"t.c", line 6: Warning: inappropriate scoping
variable 'j' may be scoped inappropriately as 'shared'
. read at line 8 and write at line 8 may cause data race
Regards,
Yuan
Haab, Grant wrote:
> Mitch,
>
> You must declare i and l private for this to work:
>
>
> #pragma omp parallel for private(i,l)
> for (j = 0; j < *n_cols; ++j) //col
> {
> ...
>
>
> That should fix the problem of slower execution. As written below,
> all the threads are using the same iteration variables for the inner
> loops, causing incorrect results and bad performance. The j index is
> made private
> for you since it is the parallel loop index. The rest you must make
> private explicitly, OR declare inside the j-loop:
>
>
> #pragma omp parallel for
> for (j = 0; j < *n_cols; ++j) //col
> {
> int i, l; // this makes i & l private
>
> ...
>
>
>
> - Grant
>
> -----Original Message-----
> From: Omp-bounces at openmp.org [mailto:Omp-bounces at openmp.org] On Behalf
> Of Bono Mitch
> Sent: Thursday, April 07, 2005 4:52 PM
> To: Meadows, Lawrence F; Omp at openmp.org
> Subject: RE: [Omp] Newbie in deep end >> Matrix multiplier > Need to
> addOpenMPdirectives
>
> Ok...
> Here's what I've done, but I don't think it's right, simply because it's
>
> slower when set to 2 threads than it is with one.
> My project is based on the standard O(n^3) algorithm so I'm not allowed
> to
> use the auto blocking or any of the other multiplication methods. Sorry
> I
> should have told you this already. Essentially I'm using the DGEMM
> Fortran
> routine > http://www.netlib.org/blas/dgemm.f
>
>
>
>
>
> #pragma omp parallel for
> for (j = 0; j < *n_cols; ++j) //col
> {
> if (*beta == 0.)
> {
> for(i = 0; i < *m_rows; ++i) //row
> {
> matrixC[i + j * *ldc] = 0.;
> }
> }
> else if (*beta != 1.)
> {
> for(i =0; i < *m_rows; ++i)//row
> {
> matrixC[i + j * *ldc] = *beta * matrixC[i + j *
> *ldc];
> }
> }
> for (l = 0; l < *k_common; ++l)
> {
> if (matrixB[l + j * *ldb] != 0.)
> {
> temp = *alpha * matrixB[l + j * *ldb];
> for(i = 0; i < *m_rows; ++i)
> {
> matrixC[i + j * *ldc] += temp *
> matrixA[i + l * *lda];
> }
> }
> }
> }
> ...
> more similar loops depending on whether the matrices A or B are to be
> transposed.
>
> What do you think?
>
> Thanks,
>
> Colm
>
> >
> >Non-trivial.
> >
> >Here's matrix multiply in Fortran: B(n,l) * C(l,m) = A(n,m)
> >(actually, this adds the product of B and C to A, but
> >this is the important bit):
> >
> >do i = 1,n
> > do j = 1,m
> > do k = 1,l
> > a(i,j) = a(i,j) + b(i,k) * c(k,j)
> > enddo
> > enddo
> >enddo
> >
> >You can parallelize any loop that doesn't carry a dependence.
> >So that would be either outer loop (extra credit: what OpenMP
> >directive would you use to parallelize the innermost loop?)
> >The outermost loop is the best because of granularity, considering
> >only parallelism. So you would do:
> >
> >!$omp parallel do
> >do i = 1,n
> > do j = 1,m
> > do k = 1,l
> > a(i,j) = a(i,j) + b(i,k) * c(k,j)
> > enddo
> > enddo
> >enddo
> >!$omp end parallel do
> >
> >The last directive is not required.
> >
> >In C, you would say:
> >
> >#pragma omp parallel for
> >
> >and translate the fortran to C, and probably put curlys around
> >the whole mess for clarity.
> >
> >So the homework question is why is this a really bad way to
> >do matrix multiply, even ignoring the OpenMP directives? Once
> >you've figured that out, which I presume that you have, and
> >presumably discovered tiling, then you have to decide which
> >loop or loops in the tiled loop nest to parallelize. I suspect
> >that the answer is the outermost block loop, but I have now
> >exhausted the limits of my knowledge of matrix multiply for
> >this semester.
> >
> >Regards,
> >
> >Larry Meadows
> >
> > >-----Original Message-----
> > >From: Omp-bounces at openmp.org [mailto:Omp-bounces at openmp.org]
> > >On Behalf Of Bono Mitch
> > >Sent: Thursday, April 07, 2005 12:30 PM
> > >To: Omp at openmp.org
> > >Subject: [Omp] Newbie in deep end >> Matrix multiplier
> >
> Need
> > >to add OpenMPdirectives
> > >
> > >Hi,
> > >
> > >I'm working on a matrix-matrix multiplier in C for my
> > >dissertation. It is
> > >based on the DGEMM fortran routine. The multiplier works great
> > >but I need to
> > >add open mp directives and I'm totally new to it so don't
> > >really have a clue
> > >where. I've looked at the FAQ's and the specifcation examples
> > >but still
> > >don't know where to add them.
> > >
> > >Can any of you point me to some examples where OpenMP has been
> > >used in this
> > >manner?
> > >
> > >Thanks for your help,
> > >
> > >Colly Mitch.
> > >
> >
> >_________________________________________________________________
> > >G-string or bloomers? Find out what to wear at MSN Weather!
> > >http://www.msn.ie/weather
> > >
> > >
> > >_______________________________________________
> > >Omp mailing list
> > >Omp at openmp.org
> > >http://openmp.org/mailman/listinfo/omp_openmp.org
> > >
>
> _________________________________________________________________
> Want to see who you're talking to? FREE MSN Messenge has webcam
> functionality! http://messenger.msn.co.uk/Beta/Default.aspx
>
>
> _______________________________________________
> Omp mailing list
> Omp at openmp.org
> http://openmp.org/mailman/listinfo/omp_openmp.org
>
> _______________________________________________
> Omp mailing list
> Omp at openmp.org
> http://openmp.org/mailman/listinfo/omp_openmp.org
More information about the Omp
mailing list