OMP Load Balancing in 'Dependent' Nested do-loops (FORTRAN)

General OpenMP discussion

OMP Load Balancing in 'Dependent' Nested do-loops (FORTRAN)

Postby ruhollah » Sun Aug 04, 2013 7:37 pm

Hi there,

I understand that when a do-loop is parallelized, OMP automatically does the load balancing by (almost) evenly distributing total iterations over all threads.

My question, however, is about the OMP load balancing in nested do-loops, where the counter of the inner loop depends on the outer loop’s counter. These kind of nested loops are very common in simulating atomic systems via molecular/Brownian dynamics ... . As far as I found, OMP seems to do a poor job in this situation. I really hope I am wrong though! (and that’s why I am posting this topic!!)

Here is the example code. If you run the following code:
-------------------------------------------------------
!optional: assign fixed number of thread
!$ call omp_set_num_threads(4)

c = 0
nall = 20
!$omp parallel do default(shared) private(i, j, k, n) firstprivate(c)

do i = 1, nall
do j =i + 1, nall

!... some work,e.g:
k = i + j

!get which thread does which part of the work
n = omp_get_thread_num()
write(*,'(a,i5)') ' im = ' , n
c = c + 1
write(n+7,*)n,c, i, j
enddo
enddo

!$omp end parallel do
-------------------------------------------------------
You will see that iterations are distributed very unevenly over four threads as following (numbers might be different on different machines, but they are badly uneven anyways):
Thread#.............number of iterations
1..........................85
2..........................60
3.......................... 35
4.......................... 10

This would be a big catastrophe for large loops and complicated calculations inside.

so, is there a way to have a more optimized load balancing for these situations?
ruhollah
 
Posts: 2
Joined: Sat Aug 03, 2013 6:33 pm

Re: OMP Load Balancing in 'Dependent' Nested do-loops (FORTR

Postby ftinetti » Mon Aug 05, 2013 4:36 am

Hi,

I do not have much time to explain, please take a look at (and experiment with) the schedule clause.

HTH,

Fernando.
ftinetti
 
Posts: 582
Joined: Wed Feb 10, 2010 2:44 pm

Re: OMP Load Balancing in 'Dependent' Nested do-loops (FORTR

Postby ruhollah » Mon Aug 05, 2013 2:09 pm

ftinetti wrote:Hi,

I do not have much time to explain, please take a look at (and experiment with) the schedule clause.

HTH,

Fernando.


Fernando,
Thank you for your suggestion. Actually, I used schedule(..) in ALL possible ways: STATIC, DYNAMIC, GUIDED, RUNTIME with different chuck size. The load balancing issue is still there. some threads do up to 500% more work than the other. this is achieved by GUIDED, which is still better than STATIC, where load balancing is terrible!

Any idea?
ruhollah
 
Posts: 2
Joined: Sat Aug 03, 2013 6:33 pm

Re: OMP Load Balancing in 'Dependent' Nested do-loops (FORTR

Postby ftinetti » Mon Aug 05, 2013 3:47 pm

Hmmm... it's hard to find something useful to explain for small workloads such as

Code: Select all
  !... some work,e.g:
  k = i + j


plus I/O, which is usually sequential... but I'll try some small explanations. Using schedule(static, 1), the unbalance is about 37%, but if nall is greater things change: nall = 50 ==> about 12% unbalance, nall = 100 ==> 6%, etc. For not so small workloads, dynamic schedule with a "little enough" chunk would give good workload balance. Where "little enough" would be a compromise between overhead and relatively many chunks per thread in order to avoid unbalance... Now that I read what I wrote I don't know whether I explained or I just made things more confusing. Anyway,

HTH,

Fernando.
ftinetti
 
Posts: 582
Joined: Wed Feb 10, 2010 2:44 pm


Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot] and 7 guests

cron