Trouble parallelizing fortran code, continued SIGSEGV faults

General OpenMP discussion

Trouble parallelizing fortran code, continued SIGSEGV faults

Postby gillesc » Fri Jun 20, 2008 3:54 pm

Hello all, I am new to OpenMP, just started 3 days ago in fact. I have some fortran code that works normally when run sequentially, but my application of OpenMP is somehow flawed. I basically have a do loop that runs upto 10^5 times, but each iteration is independent of the previous iteration, nor is the order they are run important. A modified code is as follows:
Code: Select all
   
read(*,*) numor

!$OMP PARALLEL NUM_THREADS(4) FIRSTPRIVATE(numor)
!$ PRIVATE(phi,eta,a,si,c,cphi,sphi,ctheta,stheta,calpha)
!$ PRIVATE(salpha,cbeta,sbeta,cgamma,sgamma,rot,trans)
!$ PRIVATE(dfopls,ropls,rcontact,ropls_min,dfopls_min,nopls,ifail)
!$ PRIVATE(felec,relec,counter,cntelec)
!$ PRIVATE(j,k,ior)
!$ PRIVATE(inn_ne,inn_tot,pmf,pmf_ne,num_points,dist)
!$ PRIVATE(opls_val,elec_val)

!$OMP DO SCHEDULE(DYNAMIC)   
       do ior = 1,numor ! number of orientations.
          read(3,100)phi,eta,a,si,c,junk1,junk2  ! Input orientations
C        Begin calculating energies
C
          dfopls = 0.d0
          ropls = 0.d0
          rcontact = 0.d0
          ropls_min = 0.d0
          dfopls_min=0.d0
          nopls =  0
          ifail = 0

          call opls_energy(rot,trans,dfopls,nopls,ropls,rcontact,
     1                     ropls_min,dfopls_min,ifail)

          if (ifail.lt.0)then
             goto 777
          end if

            felec = 0.d0
            relec = 0.d0
            counter = 0

             call elec_energy(rot,trans,felec,relec)

..... Extra lines of other calcs.

777    continue

      write(4,*)phi,eta,a,si,c,rcontact,dfopls_min
   
      end do                    ! main loop
     
!$OMP END DO NOWAIT     
!$OMP END PARALLEL


So first, have I defined the private variables correctly, i.e. is the syntax correct since I have so many variables that are defined differently for each iteration.

To give a little more info, the two call statements have the first two variables given as inputs and the rest are what I want out, so they were given as 0.d0 to start. These two calls are where the meat of the program is. On some occasions, I end up getting the code to run with the OMP lines but the outputs are obviously wrong.

In case it is needed, a number of variables used in either subroutines are defined through modules/workspaces, but those are internal within the subroutines.

Some system information: Mac Pro, Quad-Core Intel Xeon 51XX processors (2.66), OS X 10.5.3, Ifort 10.1.012, gfortran (GCC) 4.3.0, using either Intel MKL or Mac framework Accelerate for BLAS and LAPack.

Compilation: ifort -o testvirial.out -xT -ftz -openmp -parallel -O3 -i8 -threads initialize_opls.f initialize_elec.f opls_energy.f elec_energy.f virial.f -L$MKLPATH -I$INCLUDE -lmkl_intel_ilp64 -lmkl_lapack -lmkl_intel_thread -lmkl_core -lguide -lpthread

Thank you in advance

Chris
gillesc
 
Posts: 3
Joined: Fri Jun 20, 2008 11:06 am

Re: Trouble parallelizing fortran code, continued SIGSEGV faults

Postby ejd » Fri Jun 20, 2008 4:46 pm

You have not specified the directive continuation lines correctly. If you look at the OpenMP V3.0 (or V2.5) spec, section 2.1.1 Fixed Source Form Directives, you will find:
The following sentinels are recognized in fixed form source files:
!$omp
c$omp
*$omp

Sentinels must start in column 1 and appear as a single word with no intervening characters. Fortran fixed form line length, white space, continuation, and column rules apply to the directive line. Initial directive lines must have a space or zero in column 6, and continuation directive lines must have a character other than a space or a zero in column 6.
....
Note – in the following example, the three formats for specifying the directive are equivalent (the first line represents the position of the first 9 columns):
c23456789
!$omp parallel do shared(a,b,c)

c$omp parallel do
c$omp+shared(a,b,c)

So your code should look like:

Code: Select all
!$OMP PARALLEL NUM_THREADS(4)
!$OMP+ PRIVATE(phi,eta,a,si,c,cphi,sphi,ctheta,stheta,calpha)
!$OMP+ PRIVATE(salpha,cbeta,sbeta,cgamma,sgamma,rot,trans)
!$OMP+ PRIVATE(dfopls,ropls,rcontact,ropls_min,dfopls_min,nopls,ifail)
!$OMP+ PRIVATE(felec,relec,counter,cntelec)
!$OMP+ PRIVATE(j,k,ior)
!$OMP+ PRIVATE(inn_ne,inn_tot,pmf,pmf_ne,num_points,dist)
!$OMP+ PRIVATE(opls_val,elec_val)

And yes, I did delete the firstprivate clause on purpose. If the only place it is being used is as the loop bounds then it is fine if it is shared.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: Trouble parallelizing fortran code, continued SIGSEGV faults

Postby gillesc » Fri Jun 20, 2008 7:02 pm

Thank you, that allowed me to get past the first 4 iterations without a terminal error displayed, but the values determined by the calls are still incorrect. If there were some dependencies on the calls I would understand the errors, but since I am declaring all the variables as private I thought they would run independent. Even if they are something is happening to give the incorrect result. Also, I am trying a small number of iterations to test this (10) and after it starts running through the first 8 I still get the following errors:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
forrtl: severe (174): SIGSEGV, segmentation fault occurred

Again, thank you, does any one have any more suggestions? The code is the same as before with the correct sentinels now.

I added a Critical section around the first call, and this gave correct results, runs slower then sequentially. I basically negate the point of parallelization by adding this.

Chris
gillesc
 
Posts: 3
Joined: Fri Jun 20, 2008 11:06 am

Re: Trouble parallelizing fortran code, continued SIGSEGV faults

Postby ejd » Fri Jun 20, 2008 8:31 pm

It is hard to say much else with the information given. You said that the first two variables are input to the called routines. These variables (rot and trans) you have declared private. I am assuming that you set them before you make the calls - since the code you provided doesn't show that. Segment faults are quite often caused by you going outside of an array. Are any of these variables arrays? If so, how large are they? You might have to increase your stack size.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: Trouble parallelizing fortran code, continued SIGSEGV faults

Postby gillesc » Sat Jun 21, 2008 6:44 am

Yes, rot is a rotation 3x3 matrix calculated within the loop before the calls, and trans is a 3x1 translation vector. Now there are arrays within the calls that are on the order of 20000x20000 or larger. Would this affect that?
gillesc
 
Posts: 3
Joined: Fri Jun 20, 2008 11:06 am

Re: Trouble parallelizing fortran code, continued SIGSEGV faults

Postby ejd » Sat Jun 21, 2008 1:36 pm

Yes it does have an affect. Each thread is going to call the subroutine and get an array of it's own. That means you are going to use at least 1.6G for one 4 byte integer array per thread. This is a fair amount of storage and you may need to increase the amount of stack space available. In Unix, that means both the stacksize under the limit command and the stack space allocated for a thread. The second is usually controlled by some environment variable specific to your compiler/runtime implementation (unless you are using an OpenMP Version 3 compliant implementation, in which case you can set OMP_STACKSIZE).
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am


Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot] and 4 guests

cron