Recently, I tried to compare the speed using 'Reduction' in my simple OpenMP program to the one using 'workshare' & 'FORALL'. Three 3000*3000 matrice were assigned as B(i,j)=REAL(i), C(i,j)=REAL(i+j) and A(i,j)=REAL(i-j). Finally, all the A(i,j)*B(i,j)*C(i,j) were summed up . Program 1 used the Reduction while program 2 did with the intrinsic function SUM().

However, the results were different!! Idon't know why.... Would someone please tell me? Thanks a lot.

Program 1

- Code: Select all
`!$OMP parallel`

!$OMP DO

DO j=1,N

!$OMP parallel

!$OMP do reduction(+:sumA)

DO i=1,N

B(i,j)=real(i)

C(i,j)=real(i+j)

A(i,j)=real(i-j)

sumA=sumA+A(i,J)*B(i,j)*C(i,j)

end do

!$OMP end do

!$OMP end parallel

end do

!$OMP end do

!$OMP end parallel

sumA=sumA/real(N*N)

The summation was 2250749749.91690

Program 2

- Code: Select all
`!$OMP parallel`

!$OMP workshare

forall(i=1:N, j=1:N) B(i,j)=real(i)

!$OMP end workshare nowait

!$OMP workshare

forall(i=1:N, j=1:N) C(i,j)=real(i+j)

!$OMP end workshare

!$OMP workshare

forall(i=1:N, j=1:N) A(i,j)=real(i-j)

!$OMP end workshare

!$OMP end parallel

sumA=Sum(A*B*C)/real(N*N)

The summation was 2250749750.08717