I am currently working on parallelizing a pretty large code (fortran 95). So far, it's all working fine, the only problem is that, while running on 100% on one CPU, on four CPU's the program only runs on 340%.
I am totally new to parallel programming, so I was hoping you could give me tips on how to improve the performance.
(Of course, I'm also thankful for hints on what to read about this)
The general structure of the code is:
- Code: Select all
!$OMP PARALLEL DO FIRSTPRIVATE(bunch of data for iteration) LASTPRIVATE(more variables)&
!$OMP DEFAULT(SHARED) SCHEDULE(DYNAMIC) ORDERED NUM_THREADS(num_threads)
//iteration that takes up most of computing time
//write obtained data into files
!$OMP END CRITICAL
!$OMP END PARALLEL DO
Since there sometimes are only 1-2 values of a, b and c, but always lots of (~100) loop cycles for d, I chose
to use OMP on the innerest DO loop - was this a stupid idea? Would it bring much of an improvement to
only or also parallelize the outer loops?
Could the critical section be the problem? I could move that one into the serial section...
Thanks for any help!