Why parallel running does not speed up my job

General OpenMP discussion

Why parallel running does not speed up my job

Postby gonski » Wed Jan 30, 2008 5:25 am

Hi all,

I just finish parallelizing my code by openmp. the results with parallel and sequential runnings seem quite similar.
But I found parallel running requires the same calculation time as sequential one.
I test my jobs on a SMP unix machine. Would you please give me some hints?

All the clauses in my code have almost same format as

c$OMP parallel do default(shared)
c$OMP private(.....)
c$OMP schedule(static)
do i=1,n
enddo


The setup in pbs script is
#!/bin/ksh
#PBS -P Pitman
#PBS -N cpu8
#PBS -l walltime=7:00:00
#PBS -l mem=700MB
#PBS -l vmem=900MB
#PBS -l ncpus=8
#PBS -l jobfs=2GB
#PBS -wd
export OMP_NUM_THREADS=8

./run

exit(0)
gonski
 
Posts: 26
Joined: Fri Jan 18, 2008 10:58 pm

Re: Why parallel running does not speed up my job

Postby ejd » Wed Jan 30, 2008 6:43 am

Depending on how much work there is to be done by a program and the overhead of running in parallel, you may not see a speedup. If you look at Amdahl's Law, the speedup is limited by the sequential fraction of the program. For example, if half (.5) of your program is sequential, then the theoretical maximum speedup using parallel would be 2 - no matter how many processors you use.

So the first question is, how much of your program is now parallel? This will give you an idea of the theoretical max. The second question is whether there is enough work in the parallel region to make it worth going parallel. If there is, then you have to start looking at things like false sharing or cache contention that make running in parallel less than optimal.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: Why parallel running does not speed up my job

Postby gonski » Wed Jan 30, 2008 12:54 pm

The second question is whether there is enough work in the parallel region to make it worth going parallel.


if the calculation is big enough, parallelism may cost more computational efforts, right?



If there is, then you have to start looking at things like false sharing or cache contention that make running in parallel less than optimal.


My current understanding of OpenMP is quite limited. Could you please give me an example in this situations, especially for false cache contention? a simple pseudo code would help.
gonski
 
Posts: 26
Joined: Fri Jan 18, 2008 10:58 pm

Re: Why parallel running does not speed up my job

Postby gonski » Wed Jan 30, 2008 4:47 pm

The thing sounds really ridiculous.
Even using the following simple code to test computational efficiency with different processors, I found the efficiency decreases with the raise of the number of processors.




INTEGER(KIND=8):: i,j,k,ni,nj
integer clock_stop, clock_rate, clock_max , clock_start
real,allocatable::f(:)
open(1,file='jobm.txt')

ni=1e+6; nj=1e+3

allocate(f(nj))

f=0

call system_clock ( clock_start, clock_rate, clock_max )


c$OMP PARALLEL DO DEFAULT(SHARED),
c$OMP* PRIVATE(i,j,k,n)
c$OMP* SCHEDULE(static)

do i=1,ni
do j=1,nj
do k=1,nj
n=1
f(k)=f(k)+1
enddo
enddo
enddo

call system_clock ( clock_stop, clock_rate, clock_max )

time=(real(clock_stop)-real(clock_start) )/real(clock_rate)

write(1,*)'time=,f',time,f(1)

end
gonski
 
Posts: 26
Joined: Fri Jan 18, 2008 10:58 pm

Re: Why parallel running does not speed up my job

Postby lfm » Wed Jan 30, 2008 11:56 pm

It probably doesn't help that every thread is writing to the same array locations and there are serious race conditions. At least try making the array f 3-dimensional; you may need to reduce your problem dimensions a bit. Then, you may run into bandwidth issues. If all you are trying to do is to measure the parallel overhead then you might want to look at the EPCC microbenchmarks:
http://www2.epcc.ed.ac.uk/computing/research_activities/openmpbench/openmp_index.html
lfm
 
Posts: 135
Joined: Sun Oct 21, 2007 4:58 pm
Location: OpenMP ARB

Re: Why parallel running does not speed up my job

Postby gonski » Thu Jan 31, 2008 5:08 am

Thanks. Now I know what is happening in my code.
Code: Select all
    module aa
       real a(100000)
       end module


       program test
       use aa
       integer clock_start, clock_rate, clock_max,clock_stop

       open(1,file='jobm.txt')

       ni=100000 

       call system_clock ( clock_start, clock_rate, clock_max )


c$OMP  PARALLEL DO DEFAULT(SHARED),
c$OMP* PRIVATE(i)
c$OMP* SCHEDULE(static)

       do i=1,ni
         call  test2(i)
       enddo

       call system_clock ( clock_stop, clock_rate, clock_max )

       time=(real(clock_stop)-real(clock_start) )/real(clock_rate)

       write(1,*)'time=,f',time
       
       end

       
       subroutine test2(i)
c       use aa
       real a(100000)

       nj=1000000
       nk=1000000

       do j=1,nj
       do k=1,nk
          a(i)=a(i)+j+k   ! here the expression could be very complicated
       enddo
       enddo

       return
       end


My code has got a situation as the variable "a" in the above code. I want to calculate the sum of a(i) for further calculations. When "a" is a common variable, a race condition is generated. When "a" is defined in the subroutine test2, there is no race condition. But I can not pass out "a". How can I handle this problem?
gonski
 
Posts: 26
Joined: Fri Jan 18, 2008 10:58 pm

Re: Why parallel running does not speed up my job

Postby ejd » Sat Feb 02, 2008 6:38 am

From the code you posted (Jan 31, 2008 4:08am), there is no race condition. You can easily make the variable "a" common in the module and use it (aa) in the main and subroutine.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am


Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot] and 7 guests

cron