Avoid multiple data copying due parallel reg inside a loop

General OpenMP discussion

Avoid multiple data copying due parallel reg inside a loop

Postby arantesb » Wed Apr 16, 2014 6:34 am

Hi guys

I'm struggling to reduce the lose of time spent with reallocation of memory when I have a parallel region inside a sequential loop, for instance:

Code: Select all
do ctd=1,n,1
   [quote]sequential operations [/quote]

    !stop criteria
    if (condition) exit

   !$omp parallel shared(...)

   !$omp do private(...)
    do ktd=1,m,1
       [quote] parallel code [/quote]
    end do
   !$omp end do

   !$omp do private(...)
    do ktd=1,m,1
        [quote]more parallel code [/quote]
    end do
   !$omp end do

    !$omp end parallel
end do


This code works fine, but I think that each time the execution reaches "!$omp parallel" it copies data to all threads created and frees when reaches "!$omp end parallel". These clauses are executed so much times, so there is a huge time spent in operations not related to calculation. So I tried the following:

Code: Select all
!$omp parallel shared(...)
!$omp do private(...) ordered
do ctd=1,n,1
   
   !$omp ordered

     [quote]sequential operations [/quote]

    !stop criteria
    if (condition) exit

   !$omp end ordered

   !$omp do private(...)
    do ktd=1,m,1
        [quote]parallel code [/quote]
    end do
   !$omp end do

   !$omp do private(...)
    do ktd=1,m,1
        [quote]more parallel code [/quote]
    end do
   !$omp end do

end do
!$omp end do
!$omp end parallel


The purpose was to create a thread team once. But it has some issues:
- I do need to remove the stopping criteria, because it isn't allowed to exists an exit clause inside a parallel do (I can deal with this restriction if it's time worth);
- It isn't possible to have nested !$omp do clauses sharing the same !$omp parallel clause. THAT IS MY PROBLEM, I don't want to have a new !$omp parallel clause, sure that will create a new thread team.

Can someone contribute to solve this issue?

Kindest Regards
arantesb
 
Posts: 4
Joined: Sun Sep 08, 2013 7:40 am

Re: Avoid multiple data copying due parallel reg inside a lo

Postby ftinetti » Wed Apr 16, 2014 8:34 am

Hi,

About
I think that each time the execution reaches "!$omp parallel" it copies data to all threads created and frees when reaches "!$omp end parallel".

I think it's not the case in general, and the Spec. doesn't define any behavior, anyway (beyond that related to handling private data, of course). Most of the reports I've seen are "threads created once and deleted at the end". Sorry, I don't recall sources, but anyway maybe you should consider to review overhead sources or even if the processing is worth being parallelized at all.

HTH,

Fernando.
ftinetti
 
Posts: 582
Joined: Wed Feb 10, 2010 2:44 pm

Re: Avoid multiple data copying due parallel reg inside a lo

Postby arantesb » Wed Apr 16, 2014 10:41 am

Thanks Fernando

As my code is running, if anyone can give a straight response, I'll postpone a decision to when I'll have time to make tests with perf. I'll need to learn how to get the right data from perf and run different scenarios in order to measure wall time and memory usage (I'll use this information to assess creation or not creation of thread teams).

Kind Regards
arantesb
 
Posts: 4
Joined: Sun Sep 08, 2013 7:40 am

Re: Avoid multiple data copying due parallel reg inside a lo

Postby MarkB » Wed Apr 16, 2014 1:44 pm

There is overhead some associated with starting/stopping parallel regions, but data allocation or copying is not a problem except in a few uncommon use cases, and there are a number of other possible reasons why your code might not scale well.

If you do want to move the parallel region outside the outer loop, then this is probably the best solution:

Code: Select all
   
!$omp parallel shared(...)
    do ctd=1,n,1
       
       !$omp master

         [quote]sequential operations [/quote]

       !$omp end master
       !$omp barrier

       !$omp do private(...)
        do ktd=1,m,1
            [quote]parallel code [/quote]
        end do
       !$omp end do

       !$omp do private(...)
        do ktd=1,m,1
            [quote]more parallel code [/quote]
        end do
       !$omp end do

    end do
    !$omp end parallel
MarkB
 
Posts: 450
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: Avoid multiple data copying due parallel reg inside a lo

Postby arantesb » Thu Apr 17, 2014 1:11 pm

Dear MarkB

OK, I can understand your intention.

But will this cause that all threads created at !$omp parallel to execute the same task in racing condition until they reach the !$omp master clause?
All threads are suposed to execute do "ctd=1,n,1", but only the master one will execute the sequential operation. I'm fraid of a race condition where I'll have each iteraction executed twice ou more.

I'll make some tests and report the results. As tomorrow is holiday in Brazil, it'll take a bit longer to me return the test result.

Tanks
arantesb
 
Posts: 4
Joined: Sun Sep 08, 2013 7:40 am

Re: Avoid multiple data copying due parallel reg inside a lo

Postby arantesb » Fri Apr 18, 2014 7:31 am

Dear MarkB

Your suggestion has gotten me the best result.

I really didn't get that erratic load on the cores that I was having before (unfortunately, some configuration is preventing me to post a print screen of cpu load system monitor), the total execution time became smaller and perf stat returned good statistics.
Performance counter stats for './a.out':

357872,274904 task-clock # 5,249 CPUs utilized
390.308 context-switches # 0,001 M/sec
262 CPU-migrations # 0,000 M/sec
2.299 page-faults # 0,000 M/sec
1.016.424.307.633 cycles # 2,840 GHz [83,33%]
658.588.032.362 stalled-cycles-frontend # 64,79% frontend cycles idle [83,33%]
191.707.552.887 stalled-cycles-backend # 18,86% backend cycles idle [66,67%]
845.818.731.401 instructions # 0,83 insns per cycle
# 0,78 stalled cycles per insn [83,34%]
241.489.597.599 branches # 674,793 M/sec [83,33%]
178.766.238 branch-misses # 0,07% of all branches [83,34%]

68,177579727 seconds time elapsed


Big idle time due keyboard user interaction.

I don't know if that issue (race condition) is occurring, but the code is performing well and giving the expected result, what is fine to me.

Maybe -march=corei7 -mtune=corei7 -inline-level=2 and -O3 options are helping me to have further improvement in parameters and code compilation.

Thanks
MarkB and ftinetti
arantesb
 
Posts: 4
Joined: Sun Sep 08, 2013 7:40 am

Re: Avoid multiple data copying due parallel reg inside a lo

Postby MarkB » Mon Apr 28, 2014 4:20 am

Fortran loop iterators are private by default, so there is no race condition: each thread will (redundantly) execute the loop control for the cdt loop.
MarkB
 
Posts: 450
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh


Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 10 guests