3d arrays and OpenMP

General OpenMP discussion

3d arrays and OpenMP

Postby johannes » Thu Apr 24, 2014 8:57 am

Hi all,
I am a beginner with OpenMP and quite desparate after sevaral days without progress.

1. Bad loops?
I am solving 3D finite volume temperature fields using with Intel FV on windows7. I have tried so much of variations if OMP directives, but never got a speed up more than factor 2 even with 16 cores. What bothers me is, that in the test program attached, the CPU time recordings are so odd. The CPU time is more or less independent on NTHREADS. If you rename test3.piz to test3.zip you will find my .exe inside. I compile simply with ifort /c /Qopenmp test3.f90 link with link test3.obj and run with test3

The reason why Test3.f90 is as it is, is because the typical sort of loop in my 'big' program looks alike:
Code: Select all
do k=1,Nz
   do j=1,Ny
     do i=1,Nx
       c(i,j,k)=a(i,j,k)*b(i,j,k)+ other matrix elements - other matrix elements
     enddo
   enddo
enddo


- Is this type of loop structure impeding the use of OpenMP and how to make it better? I also have loops like
Code: Select all
     do i=1,N ; some arrays a(3,i) ; enddo 

which also do not run better.
- Are there special compiler directives to make it better (e.g. avoid conflicts with hyperthreading)?
- What to use as diagnostics? (I have to admit that I'm usung the old fashioned way of .bat to compile. I do not use the visual stuff).
- Do you remember the main tripping hazard when you started with openMP?

2. That openMP does work on my PC at all is supported by this piece of code which behaves as expected:
Code: Select all
!$OMP PARALLEL PRIVATE(i,j,k)  reduction(+:prod)
!$omp do
do k=1,Nz
do j=1,Ny
  do i=1,Nx
   prod=prod+a(i,j,k)*b(i,j,k) 
  enddo
enddo
enddo
!$omp end do
!$OMP END PARALLEL


Hoping anybody can provide me with a key idea
Best regards,
Johannes
Attachments
test3.zip
test3.zip contains test3.f90 and test3.exe
(258.91 KiB) Downloaded 93 times
johannes
 
Posts: 4
Joined: Thu Apr 24, 2014 8:43 am

Re: 3d arrays and OpenMP

Postby MarkB » Mon Apr 28, 2014 4:18 am

What are you using to measure the execution time? You need to make sure that you are measuring wall clock time and not the accumulated CPU time across all the threads....
MarkB
 
Posts: 422
Joined: Thu Jan 08, 2009 10:12 am

Re: 3d arrays and OpenMP

Postby johannes » Tue Apr 29, 2014 2:33 am

I was using
Code: Select all
t=OMP_GET_WTIME()
which seems to be correct compared to CPU_Time(t1-t0)/Nthreads
johannes
 
Posts: 4
Joined: Thu Apr 24, 2014 8:43 am

Re: 3d arrays and OpenMP

Postby MarkB » Tue Apr 29, 2014 7:39 am

I think the parallel loop goes faster on one thread than the serial one because you are not initialising the c array beforehand. The serial loop most likely results in lots of page faults as the c array is mapped into physical memory. If you are lucky, the mappings will persist even though you deallocate and reallocate the array, and so the parallel loop is not affected.

The loop you are measuring is a terrible memory bandwidth hog, so I expect the lack of speedup beyond two threads is simply due to the fact that two threads are enough to saturate the memory bandwidth on your hardware. You may need to think about restructuring your code to improve its temporal locality / cache reuse.
MarkB
 
Posts: 422
Joined: Thu Jan 08, 2009 10:12 am

Re: 3d arrays and OpenMP

Postby johannes » Thu May 01, 2014 12:17 am

Hi Marc,
I tried with SCHEDULE(....,some chunksize), but this didn't speed up either.

- Should the ultimate procedure be that I shall design some 'domain decomposition' manually?
- Is there a way to find out the actual 'memory bandwidth' without making experiments?
Best regards,
Johannes
johannes
 
Posts: 4
Joined: Thu Apr 24, 2014 8:43 am

Re: 3d arrays and OpenMP

Postby MarkB » Thu May 01, 2014 2:07 am

johannes wrote:I tried with SCHEDULE(....,some chunksize), but this didn't speed up either.

- Should the ultimate procedure be that I shall design some 'domain decomposition' manually?
- Is there a way to find out the actual 'memory bandwidth' without making experiments?


There's no load imbalance in the loop, so there's no reason to expect anything other than a STATIC schedule to improve the performance, I'm afraid. You could consider using transformations such as loop fusion and loop tiling in your main code to try to improve reuse of cached data and reduce the memory traffic. If you want to measure the memory bandwidth of your system, the STREAM benchmark might be useful: http://www.cs.virginia.edu/stream/
MarkB
 
Posts: 422
Joined: Thu Jan 08, 2009 10:12 am

Re: 3d arrays and OpenMP

Postby johannes » Thu May 01, 2014 6:35 am

Hi Marc,
STREAM is interesting. Hoping I understand.
I guess 'domain decomposition' is what you call 'tiling'.
BR, Johannes
johannes
 
Posts: 4
Joined: Thu Apr 24, 2014 8:43 am

Re: 3d arrays and OpenMP

Postby MarkB » Thu May 01, 2014 6:45 am

johannes wrote:I guess 'domain decomposition' is what you call 'tiling'.


Tiling is different from domain decomposition: there's a brief description on Wikipedia http://en.wikipedia.org/wiki/Loop_tiling
MarkB
 
Posts: 422
Joined: Thu Jan 08, 2009 10:12 am


Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 9 guests