[Omp] OpenMP Parallel Do Loops
Kay Diederichs
kay.diederichs at uni-konstanz.de
Fri Nov 18 04:35:52 PST 2005
If I use "schedule(static,n/(2*np))" or just "schedule(guided)" in the
!$OMP do
statement I get a (slightly better than) 2-fold speedup on a four-processor
Opteron for a 1000x1000 matrix using the ifort 9 compiler and the Intel MKL in
32bit mode.
Kay
ta.cbra at maths.strath.ac.uk wrote:
> Neil,
>
> Thank you for your reply. I have attached a more detailed version of my
> code that actually applies the Given Rotations (this uses BLAS routines
> drotg and drot).
>
> I have tried to implement your suggestions in this new code but I am still
> unable to get any kind of speed up when I increase the number of
> processors.
>
> Any obvious reason why to anyone? Any help is greatly appreciated!
>
> program CDGR
> include 'omp_lib.h'
>
> c Declare variable types
> integer :: i, j, x, m, n
> double precision, dimension(:,:), allocatable :: W
> double precision :: cc,ss,time
> integer np, me
>
> c Get the size of matrix to use
> write(*,*) 'What size of matrix do you wish to use?'
> write(*,*) 'Number of rows (m) ='
> read(*,*) m
> write(*,*) 'Number of columns (n) ='
> read(*,*) n
> write(*,*)'m= ',m,' and n = ',n,' thank you.'
>
> allocate(W(m,n))
>
> do i=1,m
> do j=1,n
> W(i,j)=1000*rand(i+j)
> end do
> end do
>
> ! do i=1,m
> ! write(*,*)(W(i,j),j=1,n)
> ! end do
>
> time=dtime(timearray)
>
> c Show time step (i) that each element would be annihilated during
> c to leave the matrix W upper triangular
>
> !$OMP parallel private(x,cc,ss,me,i) shared(W,np,m,n)
> np = omp_get_num_threads()
> me = omp_get_thread_num()
>
> do i=1,m+n-2
> c Every node uses the same value of i but the j values
> c are shared out and can be preformed at the same time
>
> !$OMP do schedule(dynamic,1)
> do j=1,n
> x=m+2*j-i-1
> c make sure element W(x,j) is with-in the matrix W
> if (j .lt. x) then
> if (x .le. m) then
> call drotg(W(x-1,j), W(x,j), cc, ss)
> W(x,j)=0d0
> call drot(n-j,W(x-1,j+1:n),1,W(x,j+1:n),1,cc,ss)
> ! W(x,j)=i
> endif
> endif
> enddo
> !$OMP end do
> enddo
> !$OMP end parallel
>
> time=dtime(timearray)
> write(*,*)'CDGR with ',np
> write(*,*)'m=',m,'n=',n,'time=',time,
>
> c Print W if you want to see how it was annihilated
> ! do i=1,m
> ! write(*,*)(W(i,j),j=1,n)
> ! end do
>
> deallocate(W)
>
> stop
> end
>
>
> On Mon, 14 Nov 2005, Neil Summers wrote:
>
>
>>2 things i have noticed on a quick scan of your code.
>>
>>1) you should define the parallel region outside
>>the i loop, creating a parallel region within a do loop
>>causes excessive overhead, as the program fork/joins excessively.
>>You should define the parallel region outside the i loop
>>to reduce overhead then use omp do to split work up
>>between threads. ie
>>
>>!$OMP parallel private(x,me,np)
>> do i=1,m+n-2
>>!$OMP do
>> do j=1,n
>> ...
>> enddo
>> enddo
>>!$OMP end parallel
>>
>>2) i'm supprised you get the right results,
>>by defining firstprivate(m,n), these are then undefined
>>on exiting the parallel region, so i would guess the second
>>iteration of i would not happen correctly.
>>you don't need these private, so i'd leave them shared
>>
>>Neil
>
>
--
Kay Diederichs http://strucbio.biologie.uni-konstanz.de/~kay
email: Kay.Diederichs at uni-konstanz.de Tel +49 7531 88 4049 Fax 3183
Fachbereich Biologie, Universität Konstanz, Box M647, D-78457 Konstanz
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3211 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.openmp.org/pipermail/omp/attachments/20051118/42bea8c2/smime.bin
More information about the Omp
mailing list