How to use OpenMP for complicated code?

General OpenMP discussion

How to use OpenMP for complicated code?

Postby Toey_Hylton » Mon Sep 22, 2008 11:39 am

I want to do parallel in the main code, and call a subroutine inside the loop. The subroutine may also call other complicate subroutines. In this case, how can I define the variables as shared/private, such as i,j inside subroutine serial().

If I only use one thread, it runs okay. However, if I use more than one thread, there are alway errors such as index of W goes to 1001 etc.

Thanks,

program smp
use dflib
implicit none
integer, parameter :: sz =1000
real(8), dimension(sz) :: X,W
integer :: i, IT,NT=50000,iflag =1, n =1
real(8) :: res, runtime, begtime, endtime,timef
real(8), parameter :: mlt = 1.0D+02
interface
subroutine Serial(X,W,sz)
integer,intent(in) :: sz
integer :: i,j
real(8),intent( in),dimension(sz) :: X
real(8),intent(out),dimension(sz) :: W
end subroutine Serial
end interface
do i = 1, sz
res = ran(iflag); X(i)= mlt*res*n; n=-n
enddo
write(*,*)"\nProgram started......\n"C
begtime = timef()
!$OMP PARALLEL DO

DO IT=1,NT
call Serial(X,W,sz,IT)

ENDDO
!$OMP END PARALLEL DO
endtime = timef(); runtime=endtime-begtime
write(*,'(A20,F9.4)')"\nExecution time is :"C,runtime
write(*,*)"\nProgram terminated..."C
end program smp
!
subroutine Serial(X,W,sz,IT)
integer,intent(in) :: sz
integer :: i,j
real(8),intent( in),dimension(sz) :: X
real(8),intent(out),dimension(sz) :: W
do i=1,sz
W(i)=0.D0

call subroutine work(W)
do j=1,sz
W(i)=W(i)+DABS(x(i)+x(j))
enddo
enddo
end subroutine Serial

subroutine work(W)

integer i

real W(sz)

do i=1,sz

W(i)=works(i)

enddo

endsubroutine

function work(i)
...
end
Toey_Hylton
 
Posts: 5
Joined: Mon Sep 22, 2008 11:32 am

Re: How to use OpenMP for complicated code?

Postby ejd » Tue Sep 23, 2008 7:34 am

What compiler are you using? What do you mean when you say the "index of W goes to 1001"? Where in the code is this happening and why do you think it is wrong?
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: How to use OpenMP for complicated code?

Postby Toey_Hylton » Tue Sep 23, 2008 12:47 pm

ejd,

Thanks for reply.

Actually, this is not my real code. I real code is much more complicated than this one. I use this code to simulate my case. In my real code, I have lots of global variables in the modules, and many of them are big arrays/matrices, and will be initialized before the parallelization sections, and will not be changed ( I suppose these variables will be set as shared). Some of them will be initialized and will be changed inside the parallelization section ( I suppose these variables will be set as threadprivate and copyin at the beginning). The loop variable (as IT in this code) is private. But how about other varialbes such i,j in subroutine serial()?

In my code, I did like this:

!$OMP PARALLEL DO default(shared) private(IT) copyin(threadprivate variables)

DO IT=1,NT
call Serial(X,W,sz,IT)

ENDDO
!$OMP END PARALLEL DO

Here, I am trying to parallelize my code in the most highest level. I don't know if it is possible or not.

Here is someone's suggestion from intel forum:
Add "implicit none" to all of your subroutines and see what happens.

and my further questions:
Will "IMPLICIT NONE" and "INTEGER I, J" automatically make I,J to be private in side each subroutine? My code has lots of subroutines having something like "implicit IMPLICIT REAL*8 (a-h,o-z)", and some of the variables are directly used without declairation.(Sorry, the original code is mot mine, I am trying to modify it). Thanks

I tried to modified some subroutines with "implicit none", and it seems solve some problems. But I still got an error in a subroutine with "implicit none" and "integer i".
The code is like this:

( "NL=3", "a=0.123" , "real*8, allocatable::sa2(:) " are stored in a module, and default( shared) is enabled. Allocate (sa2(NL)) is in somewhere in the code before the parallelization section.)


do i=1,NL
sa=sin(a)
sa2(i)=sa*sa
enddo

The error is " # of index of sa2 is 4 which is greater than the largest 3" when I tried to run it using two threads, and everything is ok if i only use one thread.

I am using intel fortran 10.0.X with vs2005 on dell precision dual quad-cpu machine with windows xp.






ejd wrote:What compiler are you using? What do you mean when you say the "index of W goes to 1001"? Where in the code is this happening and why do you think it is wrong?
Toey_Hylton
 
Posts: 5
Joined: Mon Sep 22, 2008 11:32 am

Re: How to use OpenMP for complicated code?

Postby Toey_Hylton » Tue Sep 23, 2008 2:06 pm

I got new problem.

In one of my subroutines, I have to allocate some (many) arrays, and the sizes of these arrays are very large. If I defined them as this way

subroutine works(N)

implicit none

integer,intent(in)::N

real a(N),b(N),c(N)

a=0d0
...

end


this OpenMP process always give me a stack overflow error even I changed both the heap reserve and stack reserve sizes to large number, say 800,000,000

Acording to someone's suggestions, I changed the definition as

subroutine works(N)

implicit none

integer,intent(in)::N

real,save,allocatable,dimension(:) a,b,c

allocate(a(N),b(n),c(N))

a=0d0
...

end

I got an error as "attempt to fetch from allocatable variable a when it is not allocated" at the line of "a=0d0". (It is possibly for the second thread, i have not figure it out yet).
Toey_Hylton
 
Posts: 5
Joined: Mon Sep 22, 2008 11:32 am

Re: How to use OpenMP for complicated code?

Postby ejd » Tue Sep 23, 2008 4:42 pm

There is no reason that adding an "implicit none" and "integer i,j" will help unless the variables "i,j" are declared in the module or the subroutines are "contained" withing the main program. If either of these are true, then the variables "i,j" would be shared. Otherwise, they should be private. In your example code, "i,j" are explicitly declared within serial, so they should be private there and variable "i" is declared within subroutine work, so it should be private. I am not sure what is going on from your example and question.

As to your last question, about the array causing a stack overflow, you need to check the documentation. There are two stacks you have to worry about. The stack of the master thread that executes the sequential part of the OpenMP program and the stack allocated for each OpenMP thread created by the master thread. For the Intel Fortran for Windows, the defaults are 2MB for the IA-32 architecture and 4MB for the IA-64 architecture. Take a look for information on the KMP_STACKSIZE environment variable.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: How to use OpenMP for complicated code?

Postby Toey_Hylton » Wed Sep 24, 2008 11:02 am

Does this "4MB for the IA-64 architecture" mean I can only use 4M memory for each OpenMP thread?

In other words, how can I deal with the large arrays in subroutines in parallelization section ?

ejd wrote:There is no reason that adding an "implicit none" and "integer i,j" will help unless the variables "i,j" are declared in the module or the subroutines are "contained" withing the main program. If either of these are true, then the variables "i,j" would be shared. Otherwise, they should be private. In your example code, "i,j" are explicitly declared within serial, so they should be private there and variable "i" is declared within subroutine work, so it should be private. I am not sure what is going on from your example and question.

As to your last question, about the array causing a stack overflow, you need to check the documentation. There are two stacks you have to worry about. The stack of the master thread that executes the sequential part of the OpenMP program and the stack allocated for each OpenMP thread created by the master thread. For the Intel Fortran for Windows, the defaults are 2MB for the IA-32 architecture and 4MB for the IA-64 architecture. Take a look for information on the KMP_STACKSIZE environment variable.
Toey_Hylton
 
Posts: 5
Joined: Mon Sep 22, 2008 11:32 am

Re: How to use OpenMP for complicated code?

Postby ejd » Wed Sep 24, 2008 11:10 am

Toey_Hylton wrote:Does this "4MB for the IA-64 architecture" mean I can only use 4M memory for each OpenMP thread?

In other words, how can I deal with the large arrays in subroutines in parallelization section ?

For the IA-64 architecture, 4MB is the default. You can increase it, by using KMP_STACKSIZE (for OpenMP pre-version 3.0 implementations) or OMP_STACKSIZE (for OpenMP V3.0 implementations). See the Intel documentation for the product you are using.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am


Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot], Yahoo [Bot] and 13 guests