forall versus do

General OpenMP discussion

forall versus do

Postby colinjcotter » Tue Jan 15, 2008 8:27 am

Dear forum,
I have been trying to improve the threading performance on a nested loop, two versions given below, one using forall + OMP WORKSHARE and one using
do + OMP DO. I get a factor of 1.7 speed up from 1 to 4 processors using the DO (which is dissappointing), and no speed up at all using the forall. What am I doing wrong? The first index of the arrays is n/2+1.

all the best

--Colin

!$OMP PARALLEL SHARED(uk_out,vk_out,wk_out,ek1,ek2,ek3,uk_in,vk_in,wk_in,dt,ukt\
_in,vkt_in,wkt_in) PRIVATE(k1,k2)
!$OMP WORKSHARE
forall(k1=1:n,k2=1:n)
uk_out(:,k2,k1) = ek1(k1)*ek2(k2)*ek3*( &
uk_in(:,k2,k1) + dt*ukt_in(:,k2,k1))
vk_out(:,k2,k1) = ek1(k1)*ek2(k2)*ek3*( &
vk_in(:,k2,k1) + dt*vkt_in(:,k2,k1))
wk_out(:,k2,k1) = ek1(k1)*ek2(k2)*ek3*( &
wk_in(:,k2,k1) + dt*wkt_in(:,k2,k1))
end forall
!$OMP END WORKSHARE
!$OMP END PARALLEL

!$OMP PARALLEL SHARED(uk_out,vk_out,wk_out,ek1,ek2,ek3,uk_in,vk_in,wk_in,dt,ukt\
_in,vkt_in,wkt_in) PRIVATE(k1,k2)
!$OMP DO SCHEDULE(STATIC)
do k1 = 1,n
do k2 = 1,n
uk_out(:,k2,k1) = ek1(k1)*ek2(k2)*ek3*( &
uk_in(:,k2,k1) + dt*ukt_in(:,k2,k1))
vk_out(:,k2,k1) = ek1(k1)*ek2(k2)*ek3*( &
vk_in(:,k2,k1) + dt*vkt_in(:,k2,k1))
wk_out(:,k2,k1) = ek1(k1)*ek2(k2)*ek3*( &
wk_in(:,k2,k1) + dt*wkt_in(:,k2,k1))
end do
end do
!$OMP END DO
!$OMP END PARALLEL
colinjcotter
 
Posts: 2
Joined: Tue Jan 15, 2008 8:22 am

Re: forall versus do

Postby colinjcotter » Tue Jan 15, 2008 8:28 am

Some more information that may be useful: n=128, and this is using Intel compiler version 9.1.
colinjcotter
 
Posts: 2
Joined: Tue Jan 15, 2008 8:22 am

Re: forall versus do

Postby lfm » Tue Jan 15, 2008 8:59 am

Workshare is very loose on assignment of work to threads. DO gives a lot more control. Further, not all compilers implement workshare and may not do a great job of implementing it. For the particular compiler you are using, I would have to do more investigation on the exact strategy that the compiler uses. As for the dissapointing speedup with DO loops, that may have to do with other factors like memory bandwidth and parallel overhead, and would also require more investigation.
lfm
 
Posts: 135
Joined: Sun Oct 21, 2007 4:58 pm
Location: OpenMP ARB


Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 9 guests