different results vs # of threads or proc's'

General OpenMP discussion

Re: different results vs # of threads or proc's'

Postby dva2tlse » Fri Nov 01, 2013 1:53 am

Hello Fernando,
-about the comments preceding OMP directives : I know that I can compile without following the OMP directives; in gfortran, the compiler that I use, I just need to NOT specify the "-fopenmp" option; the only problem then, is that some function calls or subroutines are not recognized by the linker, such as OMP_GET_NUM_PROCS, OMP_GET_NUM_THREADS, OMP_GET_THREAD_NUM and probably OMP_SET_NESTED and OMP_SET_NUM_THREADS but I did not try yet for these two latter ones; anyway, for the three first ones, I just created dummy functions or subroutines in my code, doing nothing, and the linker does not complain any more since it finds them.
But a reason for that, for example in the following code, is that it enables me to choose easyly between different trys during the developpement phase of my work, since I am not yet always sure of the way to best do the things.

Code: Select all
C$OMP PARALLEL DO SCHEDULE(DYNAMIC)
c C$OMP+DEFAULT(NONE)
C$OMP+DEFAULT(PRIVATE)
c C$OMP+DEFAULT(SHARED)
c C$OMP+PRIVATE(cont, Diml, EID, elem, elemout, nocc, NTHREADS, stot, stotfinal,
c C$OMP+TID, valinter, voltot)
C$OMP+SHARED(Ccvt1, Ccvt2, elem, elemout, FSCVT2, lina)



-Of course that it is useless to request a MASTER construct before any parallel operation, since the execution begins sequentially.

-Excuse me to have posted such a big bunch of code; yesterday thursday, I took a look at the forum from my home, and I noticed that you had underlined happily that I would "select what is important"; I intended to do so when I wrote that, but later I feared to miss anything critical, such as an instruction that has been lying there for so long, and that I have read tens or hundreds of times without thinking any longer that it could have such a critical importance, so I put everything.
Have a good week-end if you read this before monday, or have a good beginning of week if we are already on monday morning,
David
dva2tlse
 
Posts: 18
Joined: Sun Sep 08, 2013 2:52 am
Location: Toulouse, France

Re: different results vs # of threads or proc's'

Postby ftinetti » Fri Nov 01, 2013 3:17 am

Hi David,

Excuse me to have posted such a big bunch of code; yesterday thursday, I took a look at the forum from my home, and I noticed that you had underlined happily that I would "select what is important"; I intended to do so when I wrote that, but later I feared to miss anything critical

Do not worry for the length of the code you have posted, it's fine. The point is that sometimes I get too busy and I need extra time for analyzing code.

Have a good week-end if you read this before monday, or have a good beginning of week if we are already on monday morning,


Thanks, I'm still on Friday, btw,

Fernando.
ftinetti
 
Posts: 582
Joined: Wed Feb 10, 2010 2:44 pm

Re: different results vs # of threads or proc's'

Postby MarkB » Fri Nov 01, 2013 3:24 am

Hi David,

I see you are using nested parallel regions (a PARALLEL DO inside another PARALLEL DO): is that really what you intended?

Mark.
MarkB
 
Posts: 456
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: different results vs # of threads or proc's'

Postby ftinetti » Fri Nov 01, 2013 3:42 am

Hi again,

From your code, and because OpenMP are commented, I can't figure out what lines you are actually using, e.g.:
Code: Select all
    C$OMP PARALLEL DO SCHEDULE(DYNAMIC)
    c C$OMP+DEFAULT(NONE)
    C$OMP+DEFAULT(PRIVATE)
    c C$OMP+DEFAULT(SHARED)
    c C$OMP+PRIVATE(cont, Diml, EID, elem, elemout, nocc, NTHREADS, stot, stotfinal,
    c C$OMP+TID, valinter, voltot)
    C$OMP+SHARED(Ccvt1, Ccvt2, elem, elemout, FSCVT2, lina)


What DEFAULT clause are you using: NONE/PRIVATE/SHARED?

Edit: I'm a little bit "extreme" on this, but... I usually avoid using default clauses in "the debug stage" because I always forget default rules (because of my ignorance, default rules are fine, I think) so I include every variable used in the parallel region either in the PRIVATE or SHARED clause.

I think you should chose elem and elemout to be only PRIVATE or SHARED (they appear in both clauses)... I'll keep reading/analizing to suggest which one...

Fernando.
ftinetti
 
Posts: 582
Joined: Wed Feb 10, 2010 2:44 pm

Re: different results vs # of threads or proc's'

Postby ftinetti » Fri Nov 01, 2013 4:35 am

Hi,

Taking into account Mark's post and in order to reduce the number of lines in the parallel region to be analyzed, I would suggest to see first the innermost region, i.e. considering the code as

Code: Select all
          program rfomp1
          implicit none
    c
          character*8 elem(40) ! tableau des numéros d'éléments â traiter, d'indice EID.
          character*40 lina(2003067)
          character*80 file1
          character*80 linatmp
          character*80 elemout(40) ! tableau des valeurs des éléments â traiter, d'indice EID.
          character*80 line, lint
          character*3000 ligne ! ligne lue dans le csv-1
    c
          integer A
          integer B
          integer C
          integer Diml(326) ! dim en lignes pour les vols
          integer EID ! indice des tableaux de valeurs sur les éléments â traiter, elem(EID) et elemout(EID).
          integer EIDMAX ! nombre d'éléments â traiter présents dans elements.input-4
          integer FScvt2(99999, 0:8, 0:2) ! ID d'un cas de ch. (en 2-eme pos. d'une ligne du cvt-2)
          integer i
          integer j
          integer lblocana ! longueur d'un bloc du ana-1
          integer LNGANA ! longueur du fichier ana-1
          integer M(99999) ! Magnitude d'une perturbation lue dans le FSTMD du cvt-2 ZZZ
          integer Max
          integer NLana ! N³ de ligne du ana-1
          integer nocc(326) ! nombre d'occurences d'un vol type parmi le total
          integer ntf
          INTEGER NTHREADS
          INTEGER omp_get_num_procs
          INTEGER OMP_GET_NUM_THREADS
          INTEGER OMP_GET_THREAD_NUM
          INTEGER PROCNB
          integer T(99999) ! type de perturbation lue dans le FSTMD du cvt-2
          INTEGER TID
          integer toto
          integer Voltot
    c
          real Arr(31640)
          real Ccvt1(99999, 0:5, 0:8) ! coeff's des pertu's en D-1 lus dans le cvt-2
          real Ccvt2(99999, 0:5, 0:8) ! coeff's des pertu's en D-2 lus dans le cvt-2
          real cont(31640, 326) !
          real FAT
          real p ! utilisé ici et passé â rainflow en argument
          real q ! inutilisé ici mais défini et passé â rainflow en argument
          real stot
          real stotfinal
          real valinter(31640, 326)
    c
          print '(A)'
          print '(A)', 'rfomp1:00 DÅbut du programme'
          print '(A)'
          Max=1
          p=4.7 ! utilisé ici et passé à rainflow en argument
          q=0.6 ! inutilisé ici et passé à rainflow en argument
    c
    cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
    c les fichiers d'entrée :
          file1='PY/A350XWB900-526PYL-MR3.ana'
    c     file2='PY/A350XWB900-526PYL-MR3.cvt' ! redéfini au début de leccvt
    c
    cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
    c
          open(1, file=file1, err=799) ! access='READONLY',
          print '(A)', 'rfomp1:00 Ouverture unité 1 Ok, fichier ana-1='//
         +file1
    c
    cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc TMP lMaxBana
    cc Recherche d'un bloc de contraintes dans le fichier ana-1 :
    cc
    c      read(1, '(A)', err=810, end=897)linatmp
    c  88  do while(linatmp(1:3).ne.'TF_') ! boucle de recherche du "TF_" dans le ana-1
    c        read(1, '(A)', err=811, end=89) linatmp
    c      enddo
    cc     print '(A)', 'rfomp1:00 linatmp='//linatmp
    cc
    c      read(1, '(A)', err=811, end=1009)linatmp ! ligne qui suit le "TF_"
    cc     print '(A)', 'rfomp1:00  linatmp(1:9)= :'//linatmp(1:9)//
    cc    +': jst apr TF_' !                                       ça a l'air d'Átre le lblocana
    c      goto 1010 ! lecture normale de lblocana dans linatmp.
    cc
    c 1009 print '(A)', 'rfomp1:00 err=1009'
    c      print '(A)'
    c      stop
    cc
    c 1010 read(linatmp, '(I9)', err=1007)lblocana
    cc
    c      goto 1008 ! test sur lblocana.ge.lMaxbana pour garder lMaxbana
    cc
    c 1007 print '(A)', 'rfomp1:00 err=1007'
    c      print '(A)'
    c      stop
    cc
    c 1008 if(lblocana.gt.lMaxbana)then
    c        print '(A, I9)', 'rfomp1:00 lblocana=', lblocana
    c        lMaxbana=lblocana ! le plus long lblocana jusqu'ici
    c      endif
    cc
    c      goto 88 ! Recherche d'un autre bloc de contraintes dans le fichier ana-1
    cc
    c  89  print '(A, I6, A)', 'rfomp1:00 Le plus long bloc fait', lMaxbana,
    c     +' lignes.'
    c      print '(A)'
    c      rewind(1, err=801) ! rembobinage du ana-1.
    cc
    ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
    c
          print '(A)',
         +'rfomp1:00 Le plus long bloc du ana-1 fait 31640 lignes.' !
    c
          NLana=0
          print '(A)', 'rfomp1:00 call lecana(lina, NLana)'
          call lecana(lina, NLana)
    c subroutine lecana(lina, NLana)
          LNGANA=NLana
          close(1)
          print '(A, I8)', 'rfomp1:00 Fermeture Ok du ana-1, LNGANA=',
         +LNGANA
          print '(A)'
    c
    cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
    c
          call leccvt(Ccvt1, Ccvt2, FScvt2)
    c subroutine leccvt(Ccvt1, Ccvt2, FScvt2)
    c
    cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
    c le fichier elements.input :
    c
    c Un fichier elements.input contient sur chaque ligne un numéro d'élément â traiter.
    c  C'est le thread zéro, au début, qui doit lire ce fichier pour soumettre â chaque
    c instance de fabmat un numéro d'élément qui figure sur une ligne, ce qui lui permettra
    c d'en fabriquer le nom pour l'ouvrir.
    c
          EID=0
          open(3, file='1D/elements.input', err=700)
          print '(A, I6)', 'rfomp1:00 Ouverture de 1D/elements.input-3 Ok'
       70 EID=EID+1
       71 READ(3, '(A)', err=702, end=77)LINE ! sortie de la boucle de lecture en fin de fichier
          if(line(1:1).eq.'#')goto 71
          if(line(1:4).eq.'Fin.')then
            EIDMAX=EID-1
            goto 77
          endif
          READ(LINE, '(I8)', err=703)toto
          ELEM(EID)=line(1:8)
          PRINT '(A,I8,  A,I8,  A)',
         +'rfomp1:00 EID=', EID,
         +', toto=', toto,
         +' (int), ELEM(EID)(1:8)= :'//ELEM(EID)(1:8)//
         +': (ascii), LINE= :'//LINE//':'
          EIDMAX=EID
          GOTO 70 ! retour à la lecture de la prochaine ligne de elements.input
       77 print '(A, I6)', 'rfomp1:00 end=77 fin de elements.input-3 Ok'//
         +', EIDMAX=', EIDMAX
          close(3) ! fermeture du fichier elements.input
          print '(A)'
          goto 73
    c
      700 print '(A)', 'rfomp1:00 err=700 "OPEN"' ! ouverture de elements.input
      701 print '(A)', 'rfomp1:00 err=701' ! lecture dans elements.input
      702 print '(A)', 'rfomp1:00 err=702 read line= :'//line//':'
      703 print '(a, I2.2, a)', 'rfomp1:00 err=703 "READ" eid=', eid,
         +', LINE= :'//LINE//':'
      799 print '(A)', 'rfomp1:00 err=799 "OPEN"' ! ouverture du fichier ana-1
          print '(A)', '  seul le premier code d''erreur est significatif'
          print '(A)'
          stop
    c
       73 continue
    c
          CALL OMP_SET_NESTED(.TRUE.)
          CALL OMP_SET_NUM_THREADS(1) ! 2) ! 3) ! 4) !8) ! 16) ! 40) !
    c     print '(A)', 'rfomp1:00 haut de la boucle des fabmat-etc'
    c
    ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
          do EID=1, EIDMAX ! boucle sur les fabmat-etc pour chaque elem(EID)
    ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
    c
    c Obtain and print thread id
            TID=OMP_GET_THREAD_NUM()
    c       if(ntf.eq.1)then
              print '(A, I2.2, A, I2.2, A, I2.2, A)', 'rfomp1:', TID,
         +' Hello from thread TID=', TID, ', EID=', EID,
         +', elem(EID)='//elem(EID)
    c
    C Only master thread does this
             IF(TID.EQ.0)then
    c           NTHREADS=OMP_GET_NUM_THREADS()
    c           print '(A, I2)', 'rfomp1:00 Number of threads = ', NTHREADS
    C
    C Obtain and print processor Nb
    c            PROCNB = OMP_GET_NUM_PROCS()
    c            print '(A, I2)',
    c     +'rfomp1:00 Number of processors available= ', PROCNB
               ENDIF ! bas du if(TID.EQ.0)then
    c        endif ! bas du if(ntf.eq.1)then
    c
    c lina + EID -> cont + Diml + nocc
    c
             call fabmat(Ccvt1, Ccvt2, cont, Diml, EID, elem, FScvt2, lina,
         +nocc, TID)
    c  subroutine fabmat(Ccvt1, Ccvt2, cont, Diml, EID, elem, FScvt2, lina,
    c    +nocc, TID)
    c
            print '(A, I2.2, A)',
         +'rfomp1:', TID,
         +' ici aprÉs le call fabmat pour '//elem(EID)
            Stot=0
            Voltot=0
    c
    ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
    $OMP PARALLEL DO SCHEDULE(DYNAMIC)
    $OMP+DEFAULT(NONE)
    $OMP+PRIVATE(Arr, FAT, ntf, tmp, valinter, voltot, p,
    $OMP+q)
    $OMP+SHARED(cont, elem, elemout, Diml, EID, nocc, stot, stotfinal, TID)
    ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
            do ntf=1, 326 ! boucle sur les vols, =Nb colonnes de la matrice fabriquée.
    ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
    c
              call doublons(cont, Diml, ntf, valinter)
    c   subroutine doublons(cont, Diml, ntf, valinter) ! cont -> valinter
    c
              call ValInt(Arr, Diml, ntf, valinter)
    c   subroutine ValInt(Arr, Diml, ntf, valinter) ! valinter -> Arr
    c
              call rainflow(Arr, ntf, Diml, EID, FAT, p, q) ! ZZZ le if(Dim.gt.2) dépend de doublons et valinter.
    c   subroutine rainflow(Arr, ntf, Diml, EID, FAT, p, q) ! Arr -> FAT
    c
              Stot=Stot+nocc(ntf)*(FAT)**p ! cont**p
              Voltot=Voltot+nocc(ntf)
              Stotfinal=(Stot/Voltot)**(1/p) !DVA: ZZZ une équation semblable existe déjà dans rainflow.
    c
            print
         +'(A, I3, A, I3, A, I6, A, A, A, E9.3, A, E13.6, A, I5, A, E9.3)',
         +'rfomp1:00 vol=', ntf,
         +', nocc=', nocc(ntf),
         +', EID=', EID,
         +', elem(EID)=', elem(EID),
         +', Sfat=', FAT,
         +' Stot=', Stot,
         +' Voltot=', Voltot,
         +' Sequi=', Stotfinal
    c
    ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
    c C$OMP ATOMIC
    ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
    c                           1234567     12345  123456789
    c                        123       12345     12
            write(elemout(EID), '(I3, 7X, A, F9.3)')
         +                     EID, elem(EID), Stotfinal
    ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
    c
    c         if(ntf/32.0.eq.int(ntf/32.0))
    c         call subrtn(' rfomp1   ')
    c
    ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
            enddo ! bas de la boucle sur les 326 colonnes de la matrice fabriquée
    ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
    c All threads join master thread and disband
    $OMP END PARALLEL DO
    ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
    c
            print '(A, I2.2, A, I6, A)',
         +'rfomp1:', TID,
         +' boucle Ok pour EID=', EID,
         +', elem(EID)='//elem(EID)//', elemout(EID)='//elemout(EID)//'.'
            print '(A)'
    c
    ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
          enddo ! bas de la boucle qui tourne fabmat-etc pour chaque elem(EID)
    ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
    c Ouverture du fichier de sortie :
    c
          open(4, file='rfomp1.out', status='replace')
          write(4, '(A)') ' EID    elem(EID) Sfat(elem(EID))'
    c                       123       12345     12
    c                          1234567     12345  123456789
          do EID=1, EIDMAX
            write(4, '(A)') elemout(EID)
          enddo ! bas de la boucle d'écriture de elemout.
          close(4) ! fermeture du fichier de sortie rfomp1.out
    c
          print '(A)'
          print '(A)', 'rfomp1:00 Fin.'
          print '(A)'
          print '(A)', ' echo "$(cat rfomp1.out|sort -n)"'
          print '(A)', ' more rfomp1.out'
          stop
          end
    c
    c fin de rfomp1
    cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc

          subroutine lecana(lina, NLana)
          implicit none
          character*40 lina(2003067) ! ligne du ana-1
          integer i
          integer NLana

          subroutine leccvt(Ccvt1, Ccvt2, FScvt2)
          implicit none
          character*80 lint
          character*80 file2
          integer D(99999) ! direction d'une perturbation lue dans le FSTMD du cvt-2
          integer FSTcvt ! indice pour lecture directe indicée
          integer FScvt2(99999, 0:8, 0:2) ! ID d'un cas de ch. (en 2-eme pos. d'une ligne du cvt-2)
          integer FSTMDcvt(99999) ! code FSTMDcvt lu dans le cvt-2
          integer i
          integer M(99999) ! Magnitude d'une perturbation lue dans le FSTMD du cvt-2 ZZZ
          integer T(99999) ! type de perturbation lue dans le FSTMD du cvt-2
          integer tmp
          integer TID
          real Ccvt1(99999, 0:5, 0:8) ! coeff's des pertu's en D-1 lus dans le cvt-2
          real Ccvt2(99999, 0:5, 0:8) ! coeff's des pertu's en D-2 lus dans le cvt-2


          subroutine fabmat(Ccvt1, Ccvt2, cont, Diml, EID, elem, FScvt2,
         +lina, nocc, TID)
          implicit none
          character*1 RTN
          character*2 MDana(5) ! cinq pertu's lues sur deux car's dans le ana-1
          character*8 elem(40) ! tableau des numéros d'éléments â traiter, d'indice EID.
          character*40 lina(2003067) ! ligne du ana-1
          character*80 file2 ! nom du fichier cvt-2
          character*80 file3(40) ! 16) ! file3(TID+1)=nom d'un fichier d'entrée stf-TID+11
          character*80 linf ! ligne du stf-TID+11
          character*82 lint ! ligne du cvt-2
          character*2608 linp ! ligne du cmp-4 91*8=2608
    c
          integer D(99999) ! direction d'une perturbation lue dans le FSTMD du cvt-2
          integer Dana ! Direction d'une perturbation lue dans le ana-1
          integer Diml(326) ! dim en lignes pour les vols
          integer EID
          integer FSana ! ID d'un cas de ch. lu dans une ligne du ana-1
          integer FSana2 ! ID d'un cas de ch. (en 2-ème pos. d'une ligne du ana-1)
          integer FScvt2(99999, 0:8, 0:2) ! ID d'un cas de ch. (en 2-ème pos. d'une ligne du cvt-2)
          integer FSstf ! indice de Sstf pour lecture directe indicée
          integer FSTcvt ! indice pour lecture directe indicée
          integer FSTMDcvt(99999) ! code FSTMDcvt lu dans le cvt-2
          integer i ! indice de comptage de la position d'un caractère dans une ligne du ana-1
          integer imat ! indice de ligne dans un bloc du ana-1 et dans la matrice de sortie
          integer itmp
          integer j ! indice de comptage de la position sur une ligne du ana-1
          integer jtmp
          integer k ! indice de comptage de la position sur une ligne du ana-1
          integer Lana ! N³ de ligne du ana-1
          integer lblocana
    c     integer ltrbana ! nombre de Lignes TRaduites pour ce BLoc du ana-1
          integer M(99999) ! Magnitude d'une perturbation lue dans le FSTMD du cvt-2 ZZZ
          integer Mana ! Magnitude d'une perturbation lue dans le ana-1
          integer NBana ! nombre de Blocs pour ce vol dans le ana-1
          integer NLstf ! nombre de lignes dans le stf-TID+11
          integer nocc(326) ! nombre d'occurences d'un vol type parmi le total
          integer ntf ! N³ du vol typique! indice de ligne dans un bloc du ana-1 et dans la matrice de sortie
          integer OMP_GET_THREAD_NUM
          integer T(99999) ! type de perturbation lue dans le FSTMD du cvt-2
          integer Tana ! type de pertu du ana-1 (indice de boucle pour leur traitement)
          integer TID ! Thread ID
    c
          real sigm ! valeur de la contrainte 1G + les pertu's éventuelle(s) successive(s)
          real Ccvt1(99999, 0:5, 0:8) ! coeff's des pertu's en D-1 lus dans le cvt-2
          real Ccvt2(99999, 0:5, 0:8) ! coeff's des pertu's en D-2 lus dans le cvt-2
          real cont(31640, 326) !
          real Sstf(9999) ! contraintes lues dans le stf-TID+11
         
          subroutine doublons(cont, Diml, ntf, valinter)
          implicit none
          integer Diml(326) ! dim en lignes de données pour ce vol
          integer i
          integer j
          integer k ! k est le nombre de doublons
          integer ntf ! N³ de colonne de la matrice d'entrée cont; c'est le N³ du vol.
          integer OMP_GET_THREAD_NUM
          integer TID
          real cont(31640, 326) ! matrice d'entrée, l'historique des contraintes pour chaque vol.
          real epsi
          real valinter(31640) ! vecteur colonne de sortie; historique d'un vol sans doublon.

          subroutine ValInt(Arr, Diml, ntf, valinter)
          implicit none
          integer Diml(326) ! dim en lignes pour les vols
          integer i ! indice de Arr, la sortie.
          integer j ! indice de valinter, l'entrée.
          integer ntf
          integer OMP_GET_THREAD_NUM
          integer r ! c'est le nombre de valeurs intermédiaires trouvées donc éliminées
          integer TID
          real Arr(31640) ! la colonne des valeurs de sortie (avant rainflow)
          real p
          real q
          real valinter(31640) ! la colonne des valeurs d'entrée (après doublons)

          subroutine rainflow(Arr, ntf, Diml, EID, FAT, p, q)
          implicit none
          integer A
          integer af(31640)
          integer av(31640)
          integer B
          integer C
          integer col
          integer Diml(326) ! dim en lignes pour chaque vol
          integer EID
          integer i
          integer j
          integer k
          integer m
          integer Max
          integer ntf
          integer OMP_GET_THREAD_NUM
          integer TID
          integer tmp2 !DVA: sert â compter l'avancement
          real AA
          real Arr(31640)
          real BB
          real CC
          real FAT
          real p ! inutilisé dans rfomp1 mais passé ici en argument
          real q ! défini dans rfomp1 et passé ici en argument
          real R
          real Smax
          real Smin
          real X
          real Y



Please note that:
1) I've deleted the MASTER ... END MASTER
2) I've deleted the outermost PARALLEL DO ... END PARALLEL DO (the one on EID or "... pour chaque elem(EID)")
3) I've deleted the OMP FLUSH(elemout) outside the innermost DO
4) PARALLEL DO directive and clauses are now non-commented

Please tell me if all of this has any sense for you, and I'll try to keep reading/analyzing. I'm now hesitating about output inside the DO... but I do not have the complete picture yet, so maybe I'm losing something.

HTH,

Fernando.
ftinetti
 
Posts: 582
Joined: Wed Feb 10, 2010 2:44 pm

Re: different results vs # of threads or proc's'

Postby dva2tlse » Fri Nov 01, 2013 6:35 am

Hi Mark and Fernando,
I've just read all your posts'. (btw today is a day off in here, so I did not expect to see you working)
-1°) to Mark, nested parallel looping is really what I want; the outer loop is on the different entities, those which number may raise to several hundred thousands when everything would work Ok, and the inner loop is on the 326 columns of each matrix created for each entity of the outer loop.
-2°) to Fernando, I also avoid default clauses as much as possible, so all the clauses should be DEFAULT(NONE); and the elem array, which is a copy of the input numbers of the entities to proceed must be SHARED to every thread, and the elemout array contains the results and should be written in the output file, may be by under an ATOMIC statement but I'm not yet sure of that.
-3°) to Fernando again; of course that the innermost do loop may be checked first; I myself intended to begin my checkings from a very early stage, and since I'm home today, I copied back from the forum what I posted yesterday, to run it on my own machine.
I removed the first useless MASTER construct, and I noticed that in subroutine valint was declared an array
real valinter(31640, 326)
which is a column of the matrix aleady cleaned by doublons, but not yet the second cleaning operation; and it should have only one subscript. (as a column of a matrix)
And on my machine at home, it does not segfault any more once this correction is done :
real valinter(31640, 326) to real valinter(31640) in subroutine valint.
So I will check on monday if it works well with the complete subroutines at work; my ones here are empty and just return after the declarations.
And it may be okay.
Thank's alot, again and again,
David
Several years ago, I was not yet used to write an IMPLICIT NONE statement at the beginning of fortran declarations, and one single extra character made a line slip from this :
Code: Select all
c2345 789 123456789 123456789 123456789 123456789 123456789 123456789 12
      a  =   (   complicated_expression_using_b_and_c   +    d  ) * efgh
to that :
      a  =   (   complicated_expression_using_b_and_c   +    d  ) *  efgh
c2345 789 123456789 123456789 123456789 123456789 123456789 123456789 12
and then the variable "a" became something completely unpredictable depending upon a new "efg" variable having nothing to do with the "efgh" variable which should have been used. Then I came to suspect the compiler of the SUN machine that I was using, and called the hot line for that, and several months later, somebody at them called me back, saying that an IMPLICIT NONE statement would have avoided it.
dva2tlse
 
Posts: 18
Joined: Sun Sep 08, 2013 2:52 am
Location: Toulouse, France

Re: different results vs # of threads or proc's'

Postby ftinetti » Fri Nov 01, 2013 7:46 am

Hi David,

I see, just for a "general" advice, since I'll no have any more time today, so I'll continue next week: please take a look at the way arguments are used in each called subroutines (just as you discovered the bug on valinter). I think it is a good idea to explicitly declare INTENT IN/OU/INOUT because arguments with intent *out could be seen in more detail in the context of computing with threads for discovering race conditions.

HTH,

Fernando.
ftinetti
 
Posts: 582
Joined: Wed Feb 10, 2010 2:44 pm

Re: different results vs # of threads or proc's'

Postby dva2tlse » Mon Nov 04, 2013 5:51 am

Hi Mark and Fernando,
I wish you a nice beginning of week.
Well, the mistake that I found at my home on friday, once corrected here at work, had not be immediately enough to make the program work, because at home, I had also reduced the size of almost all the arrays, at least the bigger ones, to single real or integers with no subscripts, to enable the program to work on my own machine having less memory than the one that I use here at work. (and it was while doing that, that I noticed that the valinter array was one time seen as a matrix and anther time as a vector)
Once corrected all that, it didn't work immediately, but finally it seems to bo okay now on four threads. I'll see what happens when increasing that...
Bye,
David
PS: Again about memory, when a loop is declared to be run by N threads, the machine MUST have enough memory for N times the basic process. (it is no longer a question, since I answered myself while writing it, of course)
dva2tlse
 
Posts: 18
Joined: Sun Sep 08, 2013 2:52 am
Location: Toulouse, France

Re: different results vs # of threads or proc's'

Postby ftinetti » Mon Nov 04, 2013 10:38 am

Hi David,

It's good to know you are on the way of fixing everything.

About:
Again about memory, when a loop is declared to be run by N threads, the machine MUST have enough memory for N times the basic process.


I'm not sure what you mean, but: shared and "global" (e.g. COMMON and SAVE) data is stored independently of the number of threads.

Edit: SAVE is not global, but static, actually, but the example is still valid regarding data storage...

Fernando.
Last edited by ftinetti on Tue Nov 05, 2013 3:33 am, edited 1 time in total.
ftinetti
 
Posts: 582
Joined: Wed Feb 10, 2010 2:44 pm

Re: different results vs # of threads or proc's'

Postby dva2tlse » Tue Nov 05, 2013 3:07 am

Hello Fernando,
in fact,
ftinetti wrote:fixing everything
is not the case at all.

The error that I found about valinter was of no importance at all, since the argument provided by the calling program was a complete matrix, and the argument of the receiving subroutine was just the first column of this matrix; hence the reference passed was the position of the first element, and all the extra columns were just completely useless to the subroutine, but they did not cause any harm.

So I am still at the exact point were I was when I first asked the question about obtaining different resullts depending on the number of threads or proc's that I requested to use, and I still don't really know what to do or what to try.
Regards,
David
dva2tlse
 
Posts: 18
Joined: Sun Sep 08, 2013 2:52 am
Location: Toulouse, France

PreviousNext

Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 5 guests

cron