As increases # of threads, segmentation fault is occured

General OpenMP discussion

As increases # of threads, segmentation fault is occured

Postby keuntae » Mon Mar 24, 2014 10:59 pm

Dear,

I am trying to parallelize with openmp a Fortran 77 code which was written to solve problems about fluid dynamics. I tried to parallelize a loop in the code and then I compared the results between w/ and w/o parallelization. After this debugging process, by using Linux machine, the code runs fine with 4 ~ 6 threads.
Recently, in order to simulate heavier problem, I am trying to run the code with more threads by using supercomputing resources in my university. However, if number of threads is higher than 8, the error “segmentation fault (core dumped)” is occurred.

To solve this problem, I changed STACKSIZE and optimization option, etc.
However, the problem still exists.
Now, I'm sort of out of ideas. Does anyone have some advice?

Compile options are like these;
'xlf_r -qrealsize=8 -qintsize=8 -O3 -qtune=pwr6 -qarch=pwr6 -q64 -qsmp=omp -qnosave –qstrict
export OMP_STACKSIZE=99999
export OMP_NUM_THREADS=32
keuntae
 
Posts: 5
Joined: Sun Mar 23, 2014 8:39 pm

Re: As increases # of threads, segmentation fault is occured

Postby MarkB » Tue Mar 25, 2014 7:01 am

OMP_STACKSIZE only affect threads other than the master thread: you may need to use standard methods (e.g. ulimit on Linux) to increase the stack size for the master thread.

Of course, this may not be the problem: can you run with a debugger to find out where the seq fault occurred? Can you post the code for the parallel loop on here?
MarkB
 
Posts: 447
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: As increases # of threads, segmentation fault is occured

Postby keuntae » Tue Mar 25, 2014 8:18 pm

Thank you for your advice.

I re-execute the code with debugger by using dbx. After check the debugging message, I will re-post.
Anyway, the code consists of two major parts; a TDMA(tri-diagonal matrix algorithm), a solver of poisson equation.

Code description of a TDMA is as follow;
Code: Select all
      INCLUDE 'common.h'
     
      COMMON/ANGLE/OMEGA,ATHETA,ATHETAO,AATHETA     
      REAL, DIMENSION (:,:), ALLOCATABLE :: AK,BK,CK,GK
      INTEGER AllocateStatus

!$OMP PARALLEL
!$OMP&private(CT1,CT3,,ST1,ST3,EST)
      ALLOCATE(AK(M1M,M3M),STAT=AllocateStatus)
      IF(AllocateStatus .NE. 0) STOP 'Allocation error'
      ALLOCATE(BK(M1M,M3M),STAT=AllocateStatus)
      IF(AllocateStatus .NE. 0) STOP 'Allocation error'
      ALLOCATE(CK(M1M,M3M),STAT=AllocateStatus)
      IF(AllocateStatus .NE. 0) STOP 'Allocation error'
      ALLOCATE(GK(M1M,M3M),STAT=AllocateStatus)
      IF(AllocateStatus .NE. 0) STOP 'Allocation error'
!$OMP DO
      DO 100 J=1,JIN

      DO K=1,N3M
      DO I=1,N1M

     ! CT1, CT3, ST1, ST3, EST are determined at each i, j, k.
     ! ACOEF is predetermined coefficient.
     ! RHST(i,j,k) is calculated before executing a TDMA subroutine.

      AK(I,K)=   -(ST1    -CT1                         )*ACOEF
      BK(I,K)=1.+(ST1+ST3+CT1+CT3+EST      )*ACOEF
      CK(I,K)=   -(ST3    -CT3                         )*ACOEF
      GK(I,K)=RHST(I,J,K)
      ENDDO
      ENDDO

      CALL CTDMA_T(AK,BK,CK,GK,1,N3M,1,N1M)   ! TDMA in one-direction

      DO K=1,N3M
      DO I=1,N1M
      UT(I,J,K)=GK(I,K)
      ENDDO
      ENDDO

  100 CONTINUE
!$OMP END DO
      DEALLOCATE(AK,BK,CK,GK)
!$OMP END PARALLEL 


Code description of a solver of POISSON equation is as follow;
Code: Select all
C=======================================================================
                        SUBROUTINE POISSON
C=======================================================================
      INCLUDE 'common.h'

      COMPLEX CCAP(M1,M2,M3MH)
      REAL DIVGSUM(M1,M2,M3)
     
      complex, dimension (:),   allocatable :: XXXX,XXXX_B     
      complex, dimension (:,:), allocatable :: XXX     
      complex, dimension (:,:), allocatable :: CP     
      complex, dimension (:,:), allocatable :: RESID,GI
     
      CALL DIVGS(DIVGSUM) ! Calculate DIVGSUM matrix
     
!$OMP PARALLEL
!$OMP&private(XXX,XXXX,XXXX_B)   
      allocate(XXX(M3M,M1M))
      allocate(XXXX(M3M),XXXX_B(M3M*2))           
      CALL ZFFT1D(XXXX,N3M, 0,XXXX_B) ! Initialize complex variables     
!$OMP DO
      DO 100 J=1,N2M

      DO K=1,N3M
      DO I=1,N1M
       DIVGSUM(I,J,K)=DIVGSUM(I,J,K)*DTCONSTI
       XXX(K,I)=DIVGSUM(I,J,K)
      ENDDO
      ENDDO

      DO I=1,N1M
      CALL ZFFT1D(XXX(1,I),N3M,-1,XXXX_B) ! Forward FFT
      END DO

      DO K=1,N3MH
      DO I=1,N1M
      CCAP(I,J,K)=XXX(K,I)
      ENDDO
      ENDDO

  100 CONTINUE
!$OMP END DO
      deallocate(XXX,XXXX,XXXX_B)
!$OMP END PARALLEL

!$OMP PARALLEL      
!$OMP&private(CP,RESID,GI)
      allocate(CP(0:M1,0:M2))
      allocate(RESID(M1MD,M2),GI(0:M1MD,0:M2))
!$OMP DO
      DO 50 KK=1,N3MH

      DO J=1,N2
      DO I=1,N1
      CP(I,J)=0.
      ENDDO
      ENDDO
      CALL MG2D(CP,CCAP(1,1,KK),KK,TEST,0.,RESID,GI) ! Multi-grid method

      DO I=1,N1M
      DO J=1,N2M
      CCAP(I,J,KK)=CP(I,J)
      ENDDO
      ENDDO

   50 CONTINUE

!$OMP END DO
      deallocate(CP)
      deallocate(RESID,GI)
!$OMP END PARALLEL

C=======================================================================
      SUBROUTINE MG2D(PC,RHS,KV,TEST,OLDV,RESID,GI)
C=======================================================================

      INCLUDE'common.h'
      COMPLEX PC(M1,M2),RHS(M1,M2)
      COMPLEX PC(0:M1,0:M2),RHS(M1,M2)
      COMPLEX RESID(M1MD,M2),GI(0:M1MD,0:M2)
      integer KV,KH

      RETURN
      END
keuntae
 
Posts: 5
Joined: Sun Mar 23, 2014 8:39 pm

Re: As increases # of threads, segmentation fault is occured

Postby MarkB » Wed Mar 26, 2014 3:31 am

I'm a bit confused by the TDMA code: shouldn't the allocatable arrays be private? And where do the values of CT1, CT3, ST1, ST3, EST come from?
MarkB
 
Posts: 447
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: As increases # of threads, segmentation fault is occured

Postby keuntae » Wed Mar 26, 2014 4:32 pm

Dear MarkB

When I post the code on the web, the declaration of private variables for the allocatable arrays is omitted, i.e., !$OMP&private(AK,BK,CK,GK) is written. And CT1, CT3, ST1, ST3, EST are calculated in the do loop by using shared variables such as U(I,J,K), V(I,J,K), W(I,J,K) which denote physical flow velocity in the flow field.
In order to show overall structure of the code, I eliminated simple algebra too much and then it makes you confused. Now could you figure out my code?
keuntae
 
Posts: 5
Joined: Sun Mar 23, 2014 8:39 pm

Re: As increases # of threads, segmentation fault is occured

Postby MarkB » Thu Mar 27, 2014 3:07 am

OK, thanks, that make sense. I can't see anything else obviously wrong in the code.
MarkB
 
Posts: 447
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: As increases # of threads, segmentation fault is occured

Postby keuntae » Thu Mar 27, 2014 3:53 am

I also can't fine any bugs in my code. Maybe that's a problem.

Anyway, if I find any massage from the code with debugger, I will re-post details.
Tell me some advice at that time. Thank you for your help : )
keuntae
 
Posts: 5
Joined: Sun Mar 23, 2014 8:39 pm

Re: As increases # of threads, segmentation fault is occured

Postby ftinetti » Thu Mar 27, 2014 7:31 am

Hi,

Please repost the code, because in
Code: Select all
         INCLUDE 'common.h'
         
          COMMON/ANGLE/OMEGA,ATHETA,ATHETAO,AATHETA     
          REAL, DIMENSION (:,:), ALLOCATABLE :: AK,BK,CK,GK
          INTEGER AllocateStatus

    !$OMP PARALLEL
    !$OMP&private(CT1,CT3,,ST1,ST3,EST)
          ALLOCATE(AK(M1M,M3M),STAT=AllocateStatus)
          IF(AllocateStatus .NE. 0) STOP 'Allocation error'
          ALLOCATE(BK(M1M,M3M),STAT=AllocateStatus)
          IF(AllocateStatus .NE. 0) STOP 'Allocation error'
          ALLOCATE(CK(M1M,M3M),STAT=AllocateStatus)
          IF(AllocateStatus .NE. 0) STOP 'Allocation error'
          ALLOCATE(GK(M1M,M3M),STAT=AllocateStatus)
          IF(AllocateStatus .NE. 0) STOP 'Allocation error'
    !$OMP DO
          DO 100 J=1,JIN

          DO K=1,N3M
          DO I=1,N1M

         ! CT1, CT3, ST1, ST3, EST are determined at each i, j, k.
         ! ACOEF is predetermined coefficient.
         ! RHST(i,j,k) is calculated before executing a TDMA subroutine.

          AK(I,K)=   -(ST1    -CT1                         )*ACOEF
...


there are some issues not covered in your text:
!$OMP&private(AK,BK,CK,GK) is written. And CT1, CT3, ST1, ST3, EST are calculated in the do loop by using shared variables such as U(I,J,K), V(I,J,K), W(I,J,K)

or maybe they are covered, but maybe having the actual code would help: e.g. the syntax of $OMP&private(CT1,CT3,,ST1,ST3,EST), how CT1 is calculated before the CT1 value is used, where the private clause you mention is located, etc.

Fernando.
ftinetti
 
Posts: 581
Joined: Wed Feb 10, 2010 2:44 pm

Re: As increases # of threads, segmentation fault is occured

Postby keuntae » Thu Mar 27, 2014 6:48 pm

Hi Fernando

Actual codes are as follows;
Code: Select all
     
      SUBROUTINE MTM_CORE
      INCLUDE 'common.h'
     
      COMMON/ANGLE/OMEGA,ATHETA,ATHETAO,AATHETA     
      REAL, DIMENSION (:,:), ALLOCATABLE :: AK,BK,CK,GK
      INTEGER AllocateStatus

C-----AZIMUTHAL MOMENTUM ----------------------------------------------

!$OMP PARALLEL
!$OMP&private(KCM,UT_T,UT_B,CT1,CT3,ANU_P,ANU_T,ANU_B,ST1,ST3,EST)
!$OMP&private(AK,BK,CK,GK)   
      ALLOCATE(AK(M1M,M3M),STAT=AllocateStatus)
      IF(AllocateStatus .NE. 0) STOP 'Allocation error'
      ALLOCATE(BK(M1M,M3M),STAT=AllocateStatus)
      IF(AllocateStatus .NE. 0) STOP 'Allocation error'
      ALLOCATE(CK(M1M,M3M),STAT=AllocateStatus)
      IF(AllocateStatus .NE. 0) STOP 'Allocation error'
      ALLOCATE(GK(M1M,M3M),STAT=AllocateStatus)
      IF(AllocateStatus .NE. 0) STOP 'Allocation error'
!$OMP DO
      DO 100 J=1,JIN

      DO K=1,N3M
      KCM=KM(K)
      DO I=1,N1M
      UT_T=0.5*( UT(I,J,KP(K))+UT(I,J,K) )
      UT_B=0.5*( UT(I,J,K)+UT(I,J,KM(K)) )
      CT1 =-RPI(J)*UT_B*VVDT(K)
      CT3 = RPI(J)*UT_T*VVDT(K)

      ANU_P=0.5*(VIS(I,J,K)+VIS(I,J,KCM))
      ANU_T=VIS(I,J,K)
      ANU_B=VIS(I,J,KCM)
      ST1  =2*ANU_B*RP2I(J)*VVDT(K)*SSDT(KCM)
      ST3  =2*ANU_T*RP2I(J)*VVDT(K)*SSDT(K)
      EST  =  ANU_P*RP2I(J)

      AK(I,K)=  -(ST1    -CT1              )*ACOEF
      BK(I,K)=1.+(ST1+ST3+CT1+CT3+EST      )*ACOEF
      CK(I,K)=  -(ST3    -CT3              )*ACOEF
      GK(I,K)=RHST(I,J,K)
      ENDDO
      ENDDO

      CALL CTDMA_T(AK,BK,CK,GK,1,N3M,1,N1M)   ! CTDMA IN AZIMUTHAL-DIR.

      DO K=1,N3M
      DO I=1,N1M
      UT(I,J,K)=GK(I,K)
      ENDDO
      ENDDO

  100 CONTINUE
!$OMP END DO
      DEALLOCATE(AK,BK,CK,GK)
!$OMP END PARALLEL 


KM(K), UT(I,J,K), RPI(J), VVDT(K), VIS(I,J,K), RP2I(J), VVDT(K), RHST(I,J,K) are common variables defined inside 'common.h'.
This subroutine MTM_CORE and POISSON solver which was posted is one of main parts in my code.

Yesterday I re-executed the program with debugger.
When debugging which option is preferred? Could you tell some advice?
keuntae
 
Posts: 5
Joined: Sun Mar 23, 2014 8:39 pm

Re: As increases # of threads, segmentation fault is occured

Postby MarkB » Fri Mar 28, 2014 2:38 am

keuntae wrote:When debugging which option is preferred? Could you tell some advice?


I'm not sure exactly what you are asking here. As a start you should compile you code with the debug flag (usually -g), let it run under the debugger until the seg fault occurs, the get a stacktrace to tell you where the crash happened.
MarkB
 
Posts: 447
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Next

Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot], Yahoo [Bot] and 8 guests