Increasing SpeedUp, reducing overheads, Intel Fortran,OpenMP

General OpenMP discussion

Increasing SpeedUp, reducing overheads, Intel Fortran,OpenMP

Postby Jure » Thu Nov 17, 2011 1:08 pm

Hi guys, thank you very much to all who is running and contributing to this forum.

I would be glad if anyone can provide me non-general advice how to increase speed up or reduce overheads of for software which I’m trying to parallelize.

For my dissertation thesis I’m developing software to predict dam inflow. Because calibration it is time demanding I would like to reduce that, by distribution to more cores. For this purpose I’ve chose Intel Fortran and OpenMP.
Software code is consist of many nested loop which most outer one is month step (DO = 1, 12). For this reason I’ve used data decomposition approach for parallelization and I’ve got PARALLEL DO in front of first outer one loop (to ensure big granularity). This way of parallelism resulted in quite low SpeedUp approximately 2.4. Thanks to amplifier I founded that there is no any wait time, no imbalance or idle processor time - all cores are fully utilized. Increasing work inside nested loops decrease achieved SpeedUp. Also by shifting PARALLEL DO inside the nested loops (decreasing granularity) has same impact (more inside, lower SpeedUp). So I consider that low SpeedUp is caused by big parallelization overheads.

..... and what I want :-): Increase SpeedUp of parallelized software. I have following questions:
• Is there any instruction/manual how to proceed to increase SpeedUp for similar cases like my example? I’ve read quite lot of documentation from Intel, OpenMp and others but all those manuals where quite general (there is not advices for my case). List what I’ve read is below.
• Is there anyone who will be willing to guide me or provide advices to: If it is possible to increase SpeedUp and how to do that?
Source code of software is there:
http://www.brontosaurivhimalajich.cz/Sl ... raceMP.rar

Thank you very much in advance for your replies . I hope that this discussion will benefits for all involved.
With regards
Ing. Jiri Sazel

List what I already went through:
https://computing.llnl.gov/tutorials/openMP/
http://software.intel.com/en-us/article ... lications/
http://software.intel.com/en-us/article ... e-systems/
http://software.intel.com/en-us/videos/ ... 7462316001
Jure
 
Posts: 5
Joined: Thu Nov 17, 2011 12:14 pm

Re: Increasing SpeedUp, reducing overheads, Intel Fortran,Op

Postby ftinetti » Thu Nov 17, 2011 4:09 pm

Hi,

I do not have many answers... mostly questions by now:
1) How are you measuring runtime? I suggest using omp_get_wtime()
2) I assume you did check results, but just in case: did you check that the parallel version provides correct results? (Thread safety of called subroutines is involved here)
3) How many cores/processors are you using?
4) Compiler options?
5) Did you try with different values of OMP_NUM_THREADS?

My first suggestion would be to "reduce" the Parallel DO to the inner Do TestRadek = ...
Code: Select all
    Do TestRadek = PocRadekMesic, (PocetRadku - PredpovedDo), 12
        Do PocetNPHR = PocetHPROd, PocetHPRDo
            Do VerzeVah = 1, PocetVah
                RokPred = 1
                Do PredRadek = PocRadekMesic, (PocetRadku - PredpovedDo), 12
                    If (PredRadek == TestRadek) Then
                    Else
                        Call ModelOdchylekKal(Srazky, Teploty, VlivPrutoku, VlivSrazky, VlivTeploty, PredRadek, TestRadek, PocRadekMesic, PocetRadku, PocetRokuMesic, Mesic, RokPred, &
                                               PocetNPHR, Vahy, VerzeVah, KolikVstupuVahovat, PredpovedOd, PredpovedDo, VyslMrizkaBodPrut, PrutokyNormData, SrazkyNormData, TeplotyNormData)
                        RokPred = RokPred + 1
                    End If
                End Do

                ! Výpoèet KD a KDa pro daný bod møížky pro všechny pøedpovídané øádky
                Do KrokPred = PredpovedOd, PredpovedDo

                    ! .... pro paramety na pøevracení
                    MesicX = Mesic + KrokPred
                    If (MesicX > 12) Then
                        MesicX = MesicX - 12
                    End If

                    ! Pøevede normované data do reálných dat
                    Do RokPred = 1, (PocetRokuMesic - 1)     ! -1 protože je jeden rok vždy testovaný øádek
                        RangeReal(RokPred) = (10**(VyslMrizkaBodPrut(RokPred, KrokPred, 1) * Sigma(MesicX) + Mi(MesicX)) * KoefB(MesicX) + KoefA(MesicX)) / (10**(VyslMrizkaBodPrut(RokPred, KrokPred, 1) * Sigma(MesicX) + Mi(MesicX)) + 1)
                        RangePred(RokPred) = (10**(VyslMrizkaBodPrut(RokPred, KrokPred, 2) * Sigma(MesicX) + Mi(MesicX)) * KoefB(MesicX) + KoefA(MesicX)) / (10**(VyslMrizkaBodPrut(RokPred, KrokPred, 2) * Sigma(MesicX) + Mi(MesicX)) + 1)
                    End do

                    ! Vypoèítá KD a KDa pro jeden bod møížky
                    VyslMrizkaVseKD(PocetNPHR, VerzeVah, KrokPred, 1) = KD(RangeReal, RangePred)
                    VyslMrizkaVseKD(PocetNPHR, VerzeVah, KrokPred, 2) = KDa(RangeReal, RangePred)
                End Do             
            End Do   
        End Do

just to reduce the code to analyze at least at the beginning.

By the way, I see OpenMP directives in program KalibraceMP are there directives in other code?

If possible, please tell me how to compile and run this code, just in case I find some time to play with.
ftinetti
 
Posts: 558
Joined: Wed Feb 10, 2010 2:44 pm

Re: Increasing SpeedUp, reducing overheads, Intel Fortran,Op

Postby Jure » Thu Nov 17, 2011 5:59 pm

Hi Ftinetti, Thank you very much for your reply. I glad that you went through my post :-). My reaction is follows:

AD 1)
I measure runtime with Fortran function, which is placed in the beginning and end of code:
Code: Select all
call system_clock (time_begin)
...
call system_clock (time_end, count_rate)
write(*,*) 'Time of operation was ', real(time_end - time_begin)/real(count_rate), ' seconds'

AD 2) Yes, result are same
AD 3) Four core Intel processor
AD 4) I’m using Intel Fortran Composer as add-in to Microsoft Visual Studio and I’ve changed only Linker>System>Stack Reserve Size = 5MB. Rest options are default.
AD 5) No, I’ll try that tomorrow

To your firs suggestion: I’ve already done same think what you suggested and it leads to lower SpeedUp. But I’ll acquaint you with all my results about position of PARELLEL DO etc. In the following table:
* In first row are values of variable “KolikVstupuVahovat”. The more higher is this variable, the more work has to be done inside nested loop (precisely subroutine “ModelOdchylekKal” in loop with tittle “Suma rozdílů mezi…” – in which the most of computation time is spent)
* Second row is values of SpeedUp if PARELLEL DO is placed in front of “DO Mesic = 1, 12” (same like provided source code)
* Third row is values of SpeedUp if PARELLEL DO is placed in front of “Do TestRadek = PocRadekMesic, (PocetRadku - PredpovedDo), 12”
* Last row is values of SpeedUp if PARELLEL DO is placed in front of loop with tittle “Suma rozdílů mezi…” in subroutine “ModelOdchylekKal”.

___2______3______4_______5_______6_______8
__2,7____2,54____2,37____2,30____1,42_____2,0
__2,02___1,88____1,91____1,81____1,74
_________ 0,19

OpenMP directives are in the module “ZpracovaniDatModul.f90” but this not play significant role in overall SpeedUp of software (calibration).
I compile this code in Microsoft Visual Studio with add in Intel Fortran Composer. I ran this code directly from visual studio.

Please if you have any other question don’t hesitate to ask (because I want to solve my issue :-))

With regards
Jirka Sazel
Jure
 
Posts: 5
Joined: Thu Nov 17, 2011 12:14 pm

Re: Increasing SpeedUp, reducing overheads, Intel Fortran,Op

Postby ftinetti » Fri Nov 18, 2011 3:40 am

Thank you for all the details. I'm guessing something for which I need you make a quick experiment: please change
Code: Select all
call system_clock (time_begin)
...
call system_clock (time_end, count_rate)
write(*,*) 'Time of operation was ', real(time_end - time_begin)/real(count_rate), ' seconds'

by
Code: Select all
time_begin = omp_get_wtime()
...
time_end = omp_get_wtime()
write(*,*) 'Time of operation was ', real(time_end - time_begin)/real(count_rate), ' seconds'

and tell me the result. You can use those functions even in the sequential code (using the OpenMP module, of course) or just using the OpenMP version with 1 thread).

How do you set OpenMP usage in VS (Intel compiler options)?
ftinetti
 
Posts: 558
Joined: Wed Feb 10, 2010 2:44 pm

Re: Increasing SpeedUp, reducing overheads, Intel Fortran,Op

Postby Jure » Mon Nov 21, 2011 4:51 pm

Dear ftinetti,

Sorry for not responding for last few days. I was off during weekend and I hope that you are still willing to consult.

I did quick experiment and I’ve got same results. Difference was just only that function provide qutie strange values but computed SpeedUp was same:

KolikVstupuVahovat = 2, SpeedUp = 1.327e-5 / 4.784e-6 = 2.774
KolikVstupuVahovat = 3, SpeedUp = 4.186e-5 / 1.859e-5 = 2.251

I don’t know if I understand to your second question well, but, I say to Intel compiler, that in code are OpenMP directives which he should utilize by adding /Qopenmp in to the option Command Line of compiler properties (Configuration Properties > Fortran > Command Line). Rest of compiler option are set to default (except AD 4 in the previous point).

What else do you suggest?
Thank you in advance
With regards
Jirka
Jure
 
Posts: 5
Joined: Thu Nov 17, 2011 12:14 pm

Re: Increasing SpeedUp, reducing overheads, Intel Fortran,Op

Postby ftinetti » Tue Nov 22, 2011 3:51 am

Hi again,

Yes, I was asking for the proper compiler option for the compiler to compile OpenMP directives.

And again two minor details, but that can explain somethings:
1) You said you have a 4 core processor, would you mind to send the specific data of the processor (e.g. Intel Xeon <model>)?
2) Thanks for specifying runtimes, I think e-5/-6 (microseconds or tens of) in runtime is too close to every overhead in the OS, including thread management, I would not expect too much speedup having those sequential runtimes.
ftinetti
 
Posts: 558
Joined: Wed Feb 10, 2010 2:44 pm

Re: Increasing SpeedUp, reducing overheads, Intel Fortran,Op

Postby Jure » Tue Nov 22, 2011 3:25 pm

Hi and thank you for your reply,

AD 1) at home I have 4 cores Intel Core i5 CPU 760 @ 2.80 GHz and at work I have 4 cores AMD with “simmilar” performance (currently I’m not at work). But, today I did one experiment with colleague’s computer with 8 cores processor AMD Fx-8150 (4153,44 Mhz). I was changing number of cores and amount of work in nested loops (variable KolikVstupuVahovat). Brief summary is that for all situation utilization of computation potential (I mean SpeedUp / amout of cores) is between 55 to 75 % and consider that as same result like I was describing before. For more details please observe this table:
www.lnkm.cz/Slozka/Results.xlsx

AD 2) Sorry that I didn’t explain properly mentioned times before. Times of computation for 4 core processor differs from 6 to 274 seconds (more exact times please see previous file). I’ve used term “strange values“ because, although real time of computation was usual (6 or 20 seconds) time of recommend method (omp_get_wtime) was in millisecond - astonishing is that computed SpeedUp is OK.

What is your next suggestions?

Thanks in the advance
With regards
Jirka
Jure
 
Posts: 5
Joined: Thu Nov 17, 2011 12:14 pm

Re: Increasing SpeedUp, reducing overheads, Intel Fortran,Op

Postby ftinetti » Wed Nov 23, 2011 4:47 am

Thanks for the detailed data.

In general, I think (and maybe I'm wrong) that about 0.6 efficiency (that's the term for "SpeedUp / amout of cores") for a code like yours is good. Of course it could be better, but I should work more on this to find out how...

The most strange result I see is that for more sequential time usually efficiency is reduced, the general case is the opposite... so evidently I should work more on this to find out some other important detail in order to optimize results. In the meantime, maybe you can try paralellizing the inner loop as suggested and post the result... but I know this is time consuming. From my side, I've run out of ideas right now, sorry.
ftinetti
 
Posts: 558
Joined: Wed Feb 10, 2010 2:44 pm

Re: Increasing SpeedUp, reducing overheads, Intel Fortran,Op

Postby Jure » Wed Nov 23, 2011 11:32 am

Hi and thank you for your reply!

Yes, I know that 0.6 is not so bad. But
a) I’m going to extend application in the future and run time will be minimally 90 times longer.
b) I’ll be using variable KolikVstupuVahovat equal or higher to 8, it mens effectivity 0.5 or smaller.
Ftinetti: „The most strange result I see is … „
Yes I know, this is the reason why I began this discussion :-).

Ftinetti: „In the meantime, maybe you can try ….“
This I already did. Brief summary is that more inner loop paralyzed lesser SpeedUp. Detailed result are in my second post in this discussion and it starts with: “To your firs suggestion: I’ve alr … „..

Are you willing to continue discussing? Or if you are out of ideas, it means that we are in the end.

Thanks in the advance.
With regards
Jirka
Jure
 
Posts: 5
Joined: Thu Nov 17, 2011 12:14 pm

Re: Increasing SpeedUp, reducing overheads, Intel Fortran,Op

Postby ftinetti » Wed Nov 23, 2011 2:43 pm

I would need a little more time for playing around with the code before saying something else. Maybe next week I'll have the necessary time, right now I have to end somethings at work.

If you have the new version of code, please post the link from which I could download and compile/run the new version too.

I hope to come back to this soon.

Sorry I did not help... in any thing... I see now...
ftinetti
 
Posts: 558
Joined: Wed Feb 10, 2010 2:44 pm

Next

Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot] and 9 guests

cron