General OpenMP discussion


Postby dacharle » Wed Jul 02, 2008 9:57 am

Hi, I am new to parallel programming and I am trying to get a simple example to prove to myself that it works on my machine (IBM with 8 Intel processors) before I start trying to implement it in larger projects.

I code in Fortran90/95 and I am using openMP v.2.5. The code below estimates the value of pi, and I have added a few extra 'do loops' to try and increase the workload of the main 'do loop' that is being parallelized. *Note that on my system monitor all 8 processor are functioning at ~100% when NUM_THREADS=8, and when I print out the idate(8) the different threads seem to be working at the same time. The problem is that it takes more than twice as long when NUM_THREADS=8 than when NUM_THREADS=1? Any advice would be greatly appreciated.

program pi
implicit none

integer, parameter :: itr=100
integer(kind=8), parameter :: n_points=80
integer :: circle_count, i, j, k, kk, isize
real :: pie, r1, r2, x, y, x_sqr, y_sqr, avg_pie
integer, allocatable :: iseed(:)
real, dimension(itr) :: u_temp
real, dimension(n_points):: circle_count_array1
integer :: TID, OMP_GET_THREAD_NUM, idate(8)


!!!Print out Sim start time
print *, "Simulation start date/time = ", idate

do k=1,itr

!!!Generate a random seed
call random_seed(SIZE=isize) !to ensure a randomized seed, date_and_time intrinsic is used to obtain a seed from the system date
allocate( iseed(isize) )
call random_seed(GET=iseed)
iseed = iseed * (idate(8)-500) ! idate(8) contains millisecond
call random_seed(PUT=iseed)

!!!Parallelized loop
!$OMP PARALLEL NUM_THREADS(8) PRIVATE(x,x_sqr,y,y_sqr,r1,r2,j,TID)
do j=1,n_points

do i=1,n_points

do kk=1,n_points

call random_number(r1)
call random_number(r2)

x = (2.0 * r1) - 1.0
x_sqr = x * x

y = (2.0 * r2) - 1.0
y_sqr = y * y

if ((x_sqr + y_sqr) <= 1.0) then
circle_count_array1(j) = 1
circle_count_array1(j) = 0
end if


!!!Get thread number and time in milliseconds
!write(1,*), TID, idate(8)

end do
end do

end do

circle_count = SUM(circle_count_array1)

!Calculate and print to screen value of pi
pie = 4.*circle_count/n_points
u_temp(k) = pie

end do

avg_pie = SUM(u_temp)/itr
print *, "Avg 'pi' = ", avg_pie

!Print out Sim end time
print *, "Simulation end date/time = ", idate

end program pi

Posts: 1
Joined: Wed Jul 02, 2008 9:42 am

Re: Help!

Postby ejd » Fri Jul 04, 2008 3:18 pm

This is a common problem. I ran a performance tool on this and saw the following:
Code: Select all
The original program run serially:
Duration (sec): 149.186
Total Thread Time: 149.186
User Lock: 0  (0%)
Most of the time spent on lines:
call random_number(r1)

The program run using OpenMP:
Duration (sec): 195.602
Total Thread Time (sec): 1563.630
User Lock: 676.475 (43.3%)
Most of the time spent on lines:
call DATE_AND_TIME  (OMP work 650.955 sec and OMP wait 674.052 sec)
call random_number(r1)  (OMP work 80.716 sec and OMP wait 5.054 sec)

Looking at the above numbers, I decided to remove the call to the DATE_AND_TIME routine, since it isn't being used and was showing as the major contributor to the OpenMP wait time. Running the program again with this change gave me:
Code: Select all
Duration (sec): 259.493
Total Thread Time (sec): 2074.741
User Lock: 554.358 (26.7%)
Most of the time spent on lines:
call random_number(r1)
call random_number(r2)

So my "improvement" actually made things worse! Routines like date_and_time and random number have locks built in so that they can be used by multiple threads (i.e., they are thread safe). The problem is, that when you have a lot of calls to these routines, then there is a lot of time spent waiting on the locks. I could get rid of one of the locks (date_and_time), but that just put more pressure on the random number routines. What is needed is a good random number routine that can be used in a parallel program. I have one around here somewhere and when I get time I will try it and show the results.
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: Help!

Postby ejd » Sun Jul 06, 2008 5:31 pm

I finally found a parallel random number generator. It isn't a great one, but it shows that you can get get around the problem. It just takes some time and tools to look at where your problem areas are so that you can work around them.

Code: Select all
Your original program run serially:
% f90 -xO3 q15.f90
% time a.out
Simulation start date/time =  2008 7 6 -420 16 40 32 412
Avg 'pi' =  3.1454994
Simulation end date/time =  2008 7 6 -420 16 42 47 644
135.0u 0.0s 2:15 99% 0+0k 0+0io 0pf+0w

Your original program using OpenMP:
% f90 -xO3 -xopenmp q15.f90
% time a.out
Simulation start date/time =  2008 7 6 -420 16 47 55 623
Avg 'pi' =  3.1704998
Simulation end date/time =  2008 7 6 -420 16 51 11 125
1106.0u 15.0s 3:15 573% 0+0k 0+0io 0pf+0w

Modified program removing call to date_and_time:
% f90 -xO3 -xopenmp q15.f90
% time a.out
Simulation start date/time =  2008 7 6 -420 16 52 14 183
Avg 'pi' =  3.1274995
Simulation end date/time =  2008 7 6 -420 16 56 13 699
1380.0u 2.0s 3:59 575% 0+0k 0+0io 0pf+0w

Modified program removing call to date_and_time and using a parallel random number generator:
% f90 -xO3 -xopenmp q15.f90
% time a.out
Simulation start date/time =  2008 7 6 -420 17 5 34 145
Avg 'pi' =  3.1334993
Simulation end date/time =  2008 7 6 -420 17 5 52 84
30.0u 0.0s 0:18 165% 0+0k 0+0io 0pf+0w
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot] and 6 guests