OpenMP DGEMM?

General OpenMP discussion

OpenMP DGEMM?

Postby loveislonely » Wed Aug 20, 2008 12:16 pm

Hi, I have a problem when I am doing the calculation of parallel DGEMM. I am using Fortran 77, and the compiler is pgf77. I am working on a part of a program which needs me to parallel a matrix multiplication routine (DGEMM). Actually, I am using a stub of DGMEE, which is already written in OpenMP. But I am confused with the outputs. Please help me! Thank you very much!

The stub is too old that the current omp_lib is not supported (means something like omp_num_threads doesn't work). Instead, this stub uses a function (NProc(0)) to get the number of processors, and stores it as NP. Following is the openmp part of the stub:

*******************************************************************************************************
some serial work
------MinCoW = 16
------NP = NProc(0)
------ColPW = Max((N+NP-1)/NP,MinCoW)
------NWork = (N+ColPW-1)/ColPW
C N is the number of column of matrix C(M,N)
some other serial work
C
C$OMP Parallel Do Default(Shared) Schedule(Static,1) Private(IP,XN)
-------Do 100 IP = 0, (NWork-1)
---------XN = Min(N-IP*ColPW,ColPW)
---------Call DGEMM(XStr1,XStr2,XM,XN,XK,Alpha,A,XLDA,B(1+IP*IncB),
-----$ ------XLDB,Beta,C(1+IP*IncC),XLDC)
--100-------Continue
*******************************************************************************************************

Then here comes the problem. When I set the number of processors <=4, the speedup is pretty good (with 2 processors around 2 times faster, 3 processors about 3 times faster, and 4 processors 3.6 times faster), once the number of processors becomes greater than 4, e.g. 8, the speedup becomes very strange. For example the speedup for 5,6,7, and 8 processors are 2.6, 2.9, 3.4, and 1.7. I don't know why. I have run the same job many times, the outputs are always about the same, only slightly different.

Then I think it might be the reason of the stub. I put it out and make a small program to test it. Since I cannot find the source code of NProc(0), I used omp_lib in my small test program, the code of my test program is:

******************************************************************************************************
------program dgemmtest
------use omp_lib
------implicit real*8 (A-H,O-Z)
------integer i,j,k,l
------integer n,num_pes
------parameter (n=3432)
------Real*8 XA(n,n),XCa(n,n),XMA(n,n)
------integer start, finish, count_rate
------real*8 time_taken
------real*8 Zero, One, Ten
------data Zero/0.0d0/, One/1.0d0/, Ten/10.0d0/
C
------do i=1,n
--------do j=i,n
----------l=((j*(j-1))/2)+i
----------XA(j,i)=real(l)
----------XA(i,j)=XA(j,i)
--------end do
------end do
C
------do i=1,n
--------do j=1,n
----------XCa(j,i)=One/Ten
--------end do
------end do
C
------do i=1,n
--------do j=1,n
----------XMA(j,i)=Zero
--------end do
------end do
C
------call system_clock(start,count_rate)
C
------call XGEMM(1,'N','N',n,n,n,one,XA,n,XCa,n,one,XMA,n)
------call XGEMM(1,'N','N',n,n,n,one,XCa,n,XA,n,one,XMA,n)
C
C$OMP PARALLEL
C$OMP SINGLE
C$------num_pes=omp_get_num_threads()
C$OMP END SINGLE
C$OMP END PARALLEL
------call system_clock(finish,count_rate)
------time_taken = real(finish-start)/count_rate
C$------print*, time_taken, "s on ",num_pes," threads"
------stop
------end
*DGEMM Stub
some serial work
C
C$OMP PARALLEL
C$OMP SINGLE
C$------NP=omp_get_num_threads()
C$------MinCoW=16
C$OMP END SINGLE
C$OMP END PARALLEL
C
------ColPW = Max((N+NP-1)/NP,MinCoW)
------NWork = (N+ColPW-1)/ColPW
C
some other serial job
C
C$OMP Parallel Do Default(Shared) Schedule(Static,1) Private(IP,XN)
------Do 100 IP = 0, (NWork-1)
---------XN = Min(N-IP*ColPW,ColPW)
---------Call DGEMM(XStr1,XStr2,XM,XN,XK,Alpha,A,XLDA,B(1+IP*IncB),
------$------XLDB,Beta,C(1+IP*IncC),XLDC)
--100------Continue
******************************************************************************************************

I ran this small test job on 1-8 processors respectively, and the timing output become normal (the speedup increases with the increasing number of the processors). I am really confused now. Thank you so much for helping me!
loveislonely
 
Posts: 31
Joined: Wed Aug 20, 2008 11:32 am

Re: OpenMP DGEMM?

Postby ejd » Wed Aug 20, 2008 2:15 pm

You haven't said which library you are using for the routines DGMEE and DGEMM. The problem is, that since I have no idea where these are coming from, I don't know anything about what they are doing. From what you have shown, I have to wonder what exactly the function call NProc(0) does. My guess would be that it gets the number of processors, but a call (omp_get_num_procs) to do that has been part of OpenMP since the first specification. So maybe it is doing something else. Some vendors didn't want to use all the processors a machine had, because it might be a shared machine and you could easily overload it. It might be interesting to add a print to the code to see what the value is.

The other thing I am a bit confused by, is that your example is calling XGEMM, but you haven't told me what that is. I am assuming it is your own version of DGMEE where you have replaced NProc (but I am not sure). If it is, and this is what you are saying is scaling, then why don't you just go with that code?

I am leaving for vacation shortly, so I am sorry I can't help more. Maybe someone else might have some ideas.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am


Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 8 guests