different results vs # of threads or proc's'

General OpenMP discussion

different results vs # of threads or proc's'

Postby dva2tlse » Wed Oct 30, 2013 1:54 am

Hello everybody on the forum,
at my work, I have been asked to develop a rather big fortran program (big for me) for which the speed of execution is important since it should run for about a whole week to be useful. (when it will be completed and checked)
BUT I encounter a BIG problem, since it gives different results depending on the number of threads (or proc's) used.

I am somewhere about a quarter of the developement process, as far as I can evaluate it; so I already ran several different parts of the program, and they behave correctly.
Especially the part wich I expected to be the most "hungry" of machine resources, has already worked correctly on one single proc', then two, eight... and finally all the forty ones of the machine.

I am presently working on a critical part of the program, which counts a specific kind of events happening within a recording. (this recording is one input file among many of them, for which I already posted a message here to manage a "mess" between them)

The structure of the program is the following :

First, some readings are done in three files under an "OMP MASTER" directive, to acquire (shared) data useful for all the followng work :
-The first file is named the "ana" and gives data useful for all the followng work.
(this data is an array of a bit more than 2 million ascii lines of 40 char's)
-The second file is the "cvt" and gives auxiliary data to be used with the previons one.
(this data is a couple of real arrays using many subscripts : Ccvt1(99999, 0:5, 0:8) and Ccvt2(99999, 0:5, 0:8) )
-The third file is just the list of the numbers of the entities that will be processed later on.

Then it constructs a matrix whith all that data (this is the most "hungry" part, subroutine fabmat)
Then each column is cleaned by subroutines doublons and ValInter.
Then some events in each column are counted by the subroutine rainflow, giving the FAT scalar output.

Finally, the FAT output from rainflow must be accumulated over the 326 columns of the matrix, and this has to be done for each entitiy EID to EIDMAX. (several hundreds of thousands of entities, each giving itx own matrix)

pseudo code of the program :
----------------------------

Code: Select all
      program rfomp1
C$OMP MASTER
      open(1, file=file1, err=799) ! access='READONLY',
      call lecana(lina, LNGANA) ! LECture of the "ana" file.
      close(1)
c
      call leccvt(Ccvt1, Ccvt2, FScvt2) ! LECture of the "cvt" file. (open and close of fortran unit 2)
c
      open(3, file='1D/elements.input', err=700) ! file containing the numbers of the entities
   70 EID=EID+1
   71 READ(3, '(A)', err=702, end=77)LINE ! sortie de la boucle de lecture en fin de fichier
      if(line(1:1).eq.'#')goto 71
   77 print '(A, I6)', 'rfomp1:00 end=77 fin de elements.input-3 Ok'
      close(3) ! fermeture du fichier elements.input
c
      open(4, file='rfomp1.out', status='replace') ! output file
c
C$OMP END MASTER
c
       CALL OMP_SET_NUM_THREADS(40) ! 1) ! 2) ! 3) ! 4) ! 16) ! 8) !
c
c
C$OMP PARALLEL PRIVATE(cont, Diml, EID, Sstf, TID, tmp)
C$OMP+ SHARED(Ccvt1, Ccvt2, elem, elemout, lina)
C$OMP DO SCHEDULE(DYNAMIC)
c
      do EID=1, EIDMAX ! loop on the entities which numbers were read in the third file
c
        call fabmat(cont, Diml, elem(EID), lina, LNGANA, nocc, TID) ! FABricate MATrix
c
        do col=1, 326 ! loop on the 326 columns of the previous matrix
          call doublons(cont, Diml, ntf, valinter) ! first part of the cleaning
          call ValInt(Arr, Diml, ntf, valinter) ! second part of the cleaning
          call rainflow(Arr, ntf, Diml, EID, FAT, p, q) ! counting of events
        enddo ! end of the loop on the columns of the matrix
c
      enddo ! end of the loop on the entities processed
c
c All threads join master thread and disband
C$OMP END DO
C$OMP END PARALLEL
c
c

As it is now, the program gives different results depending on the number of threads allowed.
for reference, one thread gives the results of the first column, then more...

Code: Select all
    Entity_ID 1_thread  2_T.   3_T.   2_T.   4_T.   3_T.
    110005     0.700    0.700  0.602  0.532  2.912  2.446
    110006     2.407    0.719  2.446  2.912  2.912  2.602
    110007     2.851           2.602         2.912  2.602
    110026     2.330                         2.534
    110027     2.238


It looks as if there was a kind of mix between the values obtained, as many of them "appear" from time to time, more than once.
I made several other runs, and even those whith a given number of threads (e.g. 2 above) do not produce the same results. Could this be explained by the workload due to other users of the machine that changes the order in which the calculation is conducted, by changing the time slices attributed by the O.S., and therthore there is again a "mess" between the inputs or between intermedate results ?
I really do not know which directive I should check; the skeleton of my program given below may help to make a step or two towards an explanation and a solution.

Later on, I will face an other issue when it will be time to write the results to an output file;
I may need to make the rainflow subroutine write them in an "elemout" array, and flush it before "END PARALLEL", and then write the array to the output file afterwards.
Or may an atomic constructs be needed, just for the "write" to the output file ?
Do I need to allow nested loops, since those on the 326 columns of each matrix are enclosed within that on the EIDMAX entites that should be proceeded ?

Thank you to help me have a more clear sight on all this work,
David
Last edited by dva2tlse on Wed Oct 30, 2013 6:59 am, edited 2 times in total.
dva2tlse
 
Posts: 18
Joined: Sun Sep 08, 2013 2:52 am
Location: Toulouse, France

Re: different results vs # of threads or proc's'

Postby ftinetti » Wed Oct 30, 2013 3:54 am

Hi David,

I can't identify how the workload is distributed in each call to

doublons(cont, Diml, ntf, valinter) ! first part of the cleaning
ValInt(Arr, Diml, ntf, valinter) ! second part of the cleaning
rainflow(Arr, ntf, Diml, EID, FAT, p, q) ! counting of events

(called in the innermost Do)... and I'm guessing it's the problem... but maybe I'm losing somthing.

Fernando.
ftinetti
 
Posts: 582
Joined: Wed Feb 10, 2010 2:44 pm

Re: different results vs # of threads or proc's'

Postby MarkB » Wed Oct 30, 2013 4:06 am

It is likely you have a race condition in your code, where multiple threads are writing the same shared variable without any synchronisation.
I see that there are a number of variables which do not appear in either the SHARED or PRIVATE clause: in most cases such variables are shared by default. I would recommend using DEFAULT(NONE) and explicitly declaring all the variables as shared, private, reduction etc.
MarkB
 
Posts: 486
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: different results vs # of threads or proc's'

Postby dva2tlse » Wed Oct 30, 2013 7:33 am

Hi Fernando,
the issue you noticed is important; but it has already changed.
The indice of the innermost do-loop is col, the number of the column of the big matrix fabricated a bit before; in the version of the program in which I am working now, it is called ntf (Number of Typical Flight) because the values in the matrix are stress levels in a part, and vary according to the number of the flight which stress levels on that part are represented by a column of the matrix. I don't know wether it was a typo or an error during the copy-paste operation, or a difference between what I posted and what was running just before that modification, but the names are now consistent as ntf in my present program and the problem still occurs.
To MarkB,
I am glad to learn that such a statement exists, exactly similar as the IMPLICIT NONE of the fortran, this DEFAULT(NONE) is much more secure and may help me to avoid race conditions if any.

I will soon post a new version of that program with nested running enabled if this modification on the declaration of the shared variables is not enough to make it run.
Thank's a lot for the clues you already gave me,
David
dva2tlse
 
Posts: 18
Joined: Sun Sep 08, 2013 2:52 am
Location: Toulouse, France

Re: different results vs # of threads or proc's'

Postby dva2tlse » Thu Oct 31, 2013 3:38 am

Hi Fernando and Mark, and the other readers on the forum,
I am still trying to run my program and I declared every variable as either SHARED or PRIVATE. But when running, the program segfaults. I am using gbd to try to understand, and it tells me that :

Cannot access memory at address 0x7ffffd8a2618
0x0000000000407c61 in MAIN__.omp_fn.0 (.omp_data_i=) at /L/DATA_PROJECTS9/DAVID/F90/BN/rfomp1.f:1183
warning: Source file is more recent than executable. [which is not the case, but always say that]
1183 C$OMP+SHARED(Ccvt1, Ccvt2, elem, elemout, FSCVT2, lina)
Cannot access memory at address 0x7ffffd8a2618

I do not know gdb (except "bt"), but I tried to type within gdb "info symbol 0x7ffffd8a2618", to be given the name of the offending variable, but it answers "No symbol matches 0x7ffffd8a2618.".
So how can I do to be informed of the variables badly set between PRIVATE and SHARED ?
In gdb, "info variables" gives me a long list of addresses, but not the good one§
I tried to do it with my brain, knowing wich variable depends of which other and which ones should be PRIVATE or SHARED on behalf of my understanding of the process, but it isn't enough. [=>change brain ?]
Thank you to help me.
David
Last edited by dva2tlse on Thu Oct 31, 2013 3:51 am, edited 1 time in total.
dva2tlse
 
Posts: 18
Joined: Sun Sep 08, 2013 2:52 am
Location: Toulouse, France

Re: different results vs # of threads or proc's'

Postby ftinetti » Thu Oct 31, 2013 3:48 am

Hi David,

Please send the current version of the code, the one you are actually running. Also, please send the data declaration of the subroutines (by rubroutine) run in the parallel region.

Fernando.
ftinetti
 
Posts: 582
Joined: Wed Feb 10, 2010 2:44 pm

Re: different results vs # of threads or proc's'

Postby dva2tlse » Thu Oct 31, 2013 4:05 am

Hello Fernando,
I will do that after lunch (here in France it is 12) or instead of eating because I never eat at lunch, but it will be a rather long post, or it is possible to send a file as an attachment ? Anyway I try to select what is important, and thank's a lot in advance.
David
dva2tlse
 
Posts: 18
Joined: Sun Sep 08, 2013 2:52 am
Location: Toulouse, France

Re: different results vs # of threads or proc's'

Postby ftinetti » Thu Oct 31, 2013 5:09 am

Hi David

it will be a rather long post, or it is possible to send a file as an attachment ?


Ok, and yes, it is possible to send attachments (there is an "Upload attachment" button below the text pane where you write posts)

Anyway I try to select what is important, and thank's a lot in advance.


Thank you very much,

Fernando.
ftinetti
 
Posts: 582
Joined: Wed Feb 10, 2010 2:44 pm

Re: different results vs # of threads or proc's'

Postby dva2tlse » Thu Oct 31, 2013 6:03 am

Hi Fernando,
here is the current version of the code, and the data declaration of the subroutines that run in the parallel region. (it's been long in reallity in comparison with what I thought before having done it, because I wanted to check that what I was posting worked perfectly on a single thread, and here was a little mess between tmp and tmp2 and other temporary variables having approximately the same name, one of which has been shared at a time but no longer presently)
First, to let you see the picture a bit more clearly, here is a chronological summary of how the work is done :
The program is callled rfomp1.
It calls lecana and leccvt within a MASTER construct not interesting for you, to read input data that will be SHARED to every thread later on.
Then an outer PARALLEL DO loop runs fabmat which fabricates a matrix,
then the inner PARALLEL DO loop runs sequentially doublons, valint and rainflow on each of the 326 colums of the previously created matrix.
Thank's again,
David


Code: Select all
      program rfomp1
      implicit none
c
      character*8 elem(40) ! tableau des numéros d'éléments â traiter, d'indice EID.
      character*40 lina(2003067)
      character*80 file1
      character*80 linatmp
      character*80 elemout(40) ! tableau des valeurs des éléments â traiter, d'indice EID.
      character*80 line, lint
      character*3000 ligne ! ligne lue dans le csv-1
c
      integer A
      integer B
      integer C
      integer Diml(326) ! dim en lignes pour les vols
      integer EID ! indice des tableaux de valeurs sur les éléments â traiter, elem(EID) et elemout(EID).
      integer EIDMAX ! nombre d'éléments â traiter présents dans elements.input-4
      integer FScvt2(99999, 0:8, 0:2) ! ID d'un cas de ch. (en 2-eme pos. d'une ligne du cvt-2)
      integer i
      integer j
      integer lblocana ! longueur d'un bloc du ana-1
      integer LNGANA ! longueur du fichier ana-1
      integer M(99999) ! Magnitude d'une perturbation lue dans le FSTMD du cvt-2 ZZZ
      integer Max
      integer NLana ! N³ de ligne du ana-1
      integer nocc(326) ! nombre d'occurences d'un vol type parmi le total
      integer ntf
      INTEGER NTHREADS
      INTEGER omp_get_num_procs
      INTEGER OMP_GET_NUM_THREADS
      INTEGER OMP_GET_THREAD_NUM
      INTEGER PROCNB
      integer T(99999) ! type de perturbation lue dans le FSTMD du cvt-2
      INTEGER TID
      integer toto
      integer Voltot
c
      real Arr(31640)
      real Ccvt1(99999, 0:5, 0:8) ! coeff's des pertu's en D-1 lus dans le cvt-2
      real Ccvt2(99999, 0:5, 0:8) ! coeff's des pertu's en D-2 lus dans le cvt-2
      real cont(31640, 326) !
      real FAT
      real p ! utilisé ici et passé â rainflow en argument
      real q ! inutilisé ici mais défini et passé â rainflow en argument
      real stot
      real stotfinal
      real valinter(31640, 326)
c
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
C$OMP MASTER
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c
      print '(A)'
      print '(A)', 'rfomp1:00 DÅbut du programme'
      print '(A)'
      Max=1
      p=4.7 ! utilisé ici et passé à rainflow en argument
      q=0.6 ! inutilisé ici et passé à rainflow en argument
c
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c les fichiers d'entrée :
      file1='PY/A350XWB900-526PYL-MR3.ana'
c     file2='PY/A350XWB900-526PYL-MR3.cvt' ! redéfini au début de leccvt
c
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c
      open(1, file=file1, err=799) ! access='READONLY',
      print '(A)', 'rfomp1:00 Ouverture unité 1 Ok, fichier ana-1='//
     +file1
c
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc TMP lMaxBana
cc Recherche d'un bloc de contraintes dans le fichier ana-1 :
cc
c      read(1, '(A)', err=810, end=897)linatmp
c  88  do while(linatmp(1:3).ne.'TF_') ! boucle de recherche du "TF_" dans le ana-1
c        read(1, '(A)', err=811, end=89) linatmp
c      enddo
cc     print '(A)', 'rfomp1:00 linatmp='//linatmp
cc
c      read(1, '(A)', err=811, end=1009)linatmp ! ligne qui suit le "TF_"
cc     print '(A)', 'rfomp1:00  linatmp(1:9)= :'//linatmp(1:9)//
cc    +': jst apr TF_' !                                       ça a l'air d'Átre le lblocana
c      goto 1010 ! lecture normale de lblocana dans linatmp.
cc
c 1009 print '(A)', 'rfomp1:00 err=1009'
c      print '(A)'
c      stop
cc
c 1010 read(linatmp, '(I9)', err=1007)lblocana
cc
c      goto 1008 ! test sur lblocana.ge.lMaxbana pour garder lMaxbana
cc
c 1007 print '(A)', 'rfomp1:00 err=1007'
c      print '(A)'
c      stop
cc
c 1008 if(lblocana.gt.lMaxbana)then
c        print '(A, I9)', 'rfomp1:00 lblocana=', lblocana
c        lMaxbana=lblocana ! le plus long lblocana jusqu'ici
c      endif
cc
c      goto 88 ! Recherche d'un autre bloc de contraintes dans le fichier ana-1
cc
c  89  print '(A, I6, A)', 'rfomp1:00 Le plus long bloc fait', lMaxbana,
c     +' lignes.'
c      print '(A)'
c      rewind(1, err=801) ! rembobinage du ana-1.
cc
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c
      print '(A)',
     +'rfomp1:00 Le plus long bloc du ana-1 fait 31640 lignes.' !
c
      NLana=0
      print '(A)', 'rfomp1:00 call lecana(lina, NLana)'
      call lecana(lina, NLana)
c subroutine lecana(lina, NLana)
      LNGANA=NLana
      close(1)
      print '(A, I8)', 'rfomp1:00 Fermeture Ok du ana-1, LNGANA=',
     +LNGANA
      print '(A)'
c
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c
      call leccvt(Ccvt1, Ccvt2, FScvt2)
c subroutine leccvt(Ccvt1, Ccvt2, FScvt2)
c
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c le fichier elements.input :
c
c Un fichier elements.input contient sur chaque ligne un numéro d'élément â traiter.
c  C'est le thread zéro, au début, qui doit lire ce fichier pour soumettre â chaque
c instance de fabmat un numéro d'élément qui figure sur une ligne, ce qui lui permettra
c d'en fabriquer le nom pour l'ouvrir.
c
      EID=0
      open(3, file='1D/elements.input', err=700)
      print '(A, I6)', 'rfomp1:00 Ouverture de 1D/elements.input-3 Ok'
   70 EID=EID+1
   71 READ(3, '(A)', err=702, end=77)LINE ! sortie de la boucle de lecture en fin de fichier
      if(line(1:1).eq.'#')goto 71
      if(line(1:4).eq.'Fin.')then
        EIDMAX=EID-1
        goto 77
      endif
      READ(LINE, '(I8)', err=703)toto
      ELEM(EID)=line(1:8)
      PRINT '(A,I8,  A,I8,  A)',
     +'rfomp1:00 EID=', EID,
     +', toto=', toto,
     +' (int), ELEM(EID)(1:8)= :'//ELEM(EID)(1:8)//
     +': (ascii), LINE= :'//LINE//':'
      EIDMAX=EID
      GOTO 70 ! retour à la lecture de la prochaine ligne de elements.input
   77 print '(A, I6)', 'rfomp1:00 end=77 fin de elements.input-3 Ok'//
     +', EIDMAX=', EIDMAX
      close(3) ! fermeture du fichier elements.input
      print '(A)'
      goto 73
c
  700 print '(A)', 'rfomp1:00 err=700 "OPEN"' ! ouverture de elements.input
  701 print '(A)', 'rfomp1:00 err=701' ! lecture dans elements.input
  702 print '(A)', 'rfomp1:00 err=702 read line= :'//line//':'
  703 print '(a, I2.2, a)', 'rfomp1:00 err=703 "READ" eid=', eid,
     +', LINE= :'//LINE//':'
  799 print '(A)', 'rfomp1:00 err=799 "OPEN"' ! ouverture du fichier ana-1
      print '(A)', '  seul le premier code d''erreur est significatif'
      print '(A)'
      stop
c
   73 continue
c
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
C$OMP END MASTER
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c
      CALL OMP_SET_NESTED(.TRUE.)
      CALL OMP_SET_NUM_THREADS(1) ! 2) ! 3) ! 4) !8) ! 16) ! 40) !
c     print '(A)', 'rfomp1:00 haut de la boucle des fabmat-etc'
c
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
C$OMP PARALLEL DO SCHEDULE(DYNAMIC)
c C$OMP+DEFAULT(NONE)
C$OMP+DEFAULT(PRIVATE)
c C$OMP+DEFAULT(SHARED)
c C$OMP+PRIVATE(cont, Diml, EID, elem, elemout, nocc, NTHREADS, stot, stotfinal,
c C$OMP+TID, valinter, voltot)
C$OMP+SHARED(Ccvt1, Ccvt2, elem, elemout, FSCVT2, lina)
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
      do EID=1, EIDMAX ! boucle sur les fabmat-etc pour chaque elem(EID)
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c
c Obtain and print thread id
        TID=OMP_GET_THREAD_NUM()
c       if(ntf.eq.1)then
          print '(A, I2.2, A, I2.2, A, I2.2, A)', 'rfomp1:', TID,
     +' Hello from thread TID=', TID, ', EID=', EID,
     +', elem(EID)='//elem(EID)
c
C Only master thread does this
         IF(TID.EQ.0)then
c           NTHREADS=OMP_GET_NUM_THREADS()
c           print '(A, I2)', 'rfomp1:00 Number of threads = ', NTHREADS
C
C Obtain and print processor Nb
c            PROCNB = OMP_GET_NUM_PROCS()
c            print '(A, I2)',
c     +'rfomp1:00 Number of processors available= ', PROCNB
           ENDIF ! bas du if(TID.EQ.0)then
c        endif ! bas du if(ntf.eq.1)then
c
c lina + EID -> cont + Diml + nocc
c
         call fabmat(Ccvt1, Ccvt2, cont, Diml, EID, elem, FScvt2, lina,
     +nocc, TID)
c  subroutine fabmat(Ccvt1, Ccvt2, cont, Diml, EID, elem, FScvt2, lina,
c    +nocc, TID)
c
        print '(A, I2.2, A)',
     +'rfomp1:', TID,
     +' ici aprÉs le call fabmat pour '//elem(EID)
        Stot=0
        Voltot=0
c
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
C$OMP PARALLEL DO SCHEDULE(DYNAMIC)
C$OMP+DEFAULT(NONE)
C$OMP+PRIVATE(Arr, FAT, ntf, tmp, valinter, voltot, p,
C$OMP+q)
C$OMP+SHARED(cont, elem, elemout, Diml, EID, nocc, stot, stotfinal, TID)
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
        do ntf=1, 326 ! boucle sur les vols, =Nb colonnes de la matrice fabriquée.
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c
          call doublons(cont, Diml, ntf, valinter)
c   subroutine doublons(cont, Diml, ntf, valinter) ! cont -> valinter
c
          call ValInt(Arr, Diml, ntf, valinter)
c   subroutine ValInt(Arr, Diml, ntf, valinter) ! valinter -> Arr
c
          call rainflow(Arr, ntf, Diml, EID, FAT, p, q) ! ZZZ le if(Dim.gt.2) dépend de doublons et valinter.
c   subroutine rainflow(Arr, ntf, Diml, EID, FAT, p, q) ! Arr -> FAT
c
          Stot=Stot+nocc(ntf)*(FAT)**p ! cont**p
          Voltot=Voltot+nocc(ntf)
          Stotfinal=(Stot/Voltot)**(1/p) !DVA: ZZZ une équation semblable existe déjà dans rainflow.
c
        print
     +'(A, I3, A, I3, A, I6, A, A, A, E9.3, A, E13.6, A, I5, A, E9.3)',
     +'rfomp1:00 vol=', ntf,
     +', nocc=', nocc(ntf),
     +', EID=', EID,
     +', elem(EID)=', elem(EID),
     +', Sfat=', FAT,
     +' Stot=', Stot,
     +' Voltot=', Voltot,
     +' Sequi=', Stotfinal
c
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c C$OMP ATOMIC
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c                           1234567     12345  123456789
c                        123       12345     12
        write(elemout(EID), '(I3, 7X, A, F9.3)')
     +                     EID, elem(EID), Stotfinal
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c
c         if(ntf/32.0.eq.int(ntf/32.0))
c         call subrtn(' rfomp1   ')
c
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
        enddo ! bas de la boucle sur les 326 colonnes de la matrice fabriquée
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c All threads join master thread and disband
C$OMP END PARALLEL DO
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c C$OMP FLUSH(elemout)
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c
        print '(A, I2.2, A, I6, A)',
     +'rfomp1:', TID,
     +' boucle Ok pour EID=', EID,
     +', elem(EID)='//elem(EID)//', elemout(EID)='//elemout(EID)//'.'
        print '(A)'
c
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
      enddo ! bas de la boucle qui tourne fabmat-etc pour chaque elem(EID)
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c All threads join master thread and disband
C$OMP END PARALLEL DO
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c Ouverture du fichier de sortie :
c
      open(4, file='rfomp1.out', status='replace')
      write(4, '(A)') ' EID    elem(EID) Sfat(elem(EID))'
c                       123       12345     12
c                          1234567     12345  123456789
      do EID=1, EIDMAX
        write(4, '(A)') elemout(EID)
      enddo ! bas de la boucle d'écriture de elemout.
      close(4) ! fermeture du fichier de sortie rfomp1.out
c
      print '(A)'
      print '(A)', 'rfomp1:00 Fin.'
      print '(A)'
      print '(A)', ' echo "$(cat rfomp1.out|sort -n)"'
      print '(A)', ' more rfomp1.out'
      stop
      end
c
c fin de rfomp1
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc

      subroutine lecana(lina, NLana)
      implicit none
      character*40 lina(2003067) ! ligne du ana-1
      integer i
      integer NLana

      subroutine leccvt(Ccvt1, Ccvt2, FScvt2)
      implicit none
      character*80 lint
      character*80 file2
      integer D(99999) ! direction d'une perturbation lue dans le FSTMD du cvt-2
      integer FSTcvt ! indice pour lecture directe indicée
      integer FScvt2(99999, 0:8, 0:2) ! ID d'un cas de ch. (en 2-eme pos. d'une ligne du cvt-2)
      integer FSTMDcvt(99999) ! code FSTMDcvt lu dans le cvt-2
      integer i
      integer M(99999) ! Magnitude d'une perturbation lue dans le FSTMD du cvt-2 ZZZ
      integer T(99999) ! type de perturbation lue dans le FSTMD du cvt-2
      integer tmp
      integer TID
      real Ccvt1(99999, 0:5, 0:8) ! coeff's des pertu's en D-1 lus dans le cvt-2
      real Ccvt2(99999, 0:5, 0:8) ! coeff's des pertu's en D-2 lus dans le cvt-2


      subroutine fabmat(Ccvt1, Ccvt2, cont, Diml, EID, elem, FScvt2,
     +lina, nocc, TID)
      implicit none
      character*1 RTN
      character*2 MDana(5) ! cinq pertu's lues sur deux car's dans le ana-1
      character*8 elem(40) ! tableau des numéros d'éléments â traiter, d'indice EID.
      character*40 lina(2003067) ! ligne du ana-1
      character*80 file2 ! nom du fichier cvt-2
      character*80 file3(40) ! 16) ! file3(TID+1)=nom d'un fichier d'entrée stf-TID+11
      character*80 linf ! ligne du stf-TID+11
      character*82 lint ! ligne du cvt-2
      character*2608 linp ! ligne du cmp-4 91*8=2608
c
      integer D(99999) ! direction d'une perturbation lue dans le FSTMD du cvt-2
      integer Dana ! Direction d'une perturbation lue dans le ana-1
      integer Diml(326) ! dim en lignes pour les vols
      integer EID
      integer FSana ! ID d'un cas de ch. lu dans une ligne du ana-1
      integer FSana2 ! ID d'un cas de ch. (en 2-ème pos. d'une ligne du ana-1)
      integer FScvt2(99999, 0:8, 0:2) ! ID d'un cas de ch. (en 2-ème pos. d'une ligne du cvt-2)
      integer FSstf ! indice de Sstf pour lecture directe indicée
      integer FSTcvt ! indice pour lecture directe indicée
      integer FSTMDcvt(99999) ! code FSTMDcvt lu dans le cvt-2
      integer i ! indice de comptage de la position d'un caractère dans une ligne du ana-1
      integer imat ! indice de ligne dans un bloc du ana-1 et dans la matrice de sortie
      integer itmp
      integer j ! indice de comptage de la position sur une ligne du ana-1
      integer jtmp
      integer k ! indice de comptage de la position sur une ligne du ana-1
      integer Lana ! N³ de ligne du ana-1
      integer lblocana
c     integer ltrbana ! nombre de Lignes TRaduites pour ce BLoc du ana-1
      integer M(99999) ! Magnitude d'une perturbation lue dans le FSTMD du cvt-2 ZZZ
      integer Mana ! Magnitude d'une perturbation lue dans le ana-1
      integer NBana ! nombre de Blocs pour ce vol dans le ana-1
      integer NLstf ! nombre de lignes dans le stf-TID+11
      integer nocc(326) ! nombre d'occurences d'un vol type parmi le total
      integer ntf ! N³ du vol typique! indice de ligne dans un bloc du ana-1 et dans la matrice de sortie
      integer OMP_GET_THREAD_NUM
      integer T(99999) ! type de perturbation lue dans le FSTMD du cvt-2
      integer Tana ! type de pertu du ana-1 (indice de boucle pour leur traitement)
      integer TID ! Thread ID
c
      real sigm ! valeur de la contrainte 1G + les pertu's éventuelle(s) successive(s)
      real Ccvt1(99999, 0:5, 0:8) ! coeff's des pertu's en D-1 lus dans le cvt-2
      real Ccvt2(99999, 0:5, 0:8) ! coeff's des pertu's en D-2 lus dans le cvt-2
      real cont(31640, 326) !
      real Sstf(9999) ! contraintes lues dans le stf-TID+11
     
      subroutine doublons(cont, Diml, ntf, valinter)
      implicit none
      integer Diml(326) ! dim en lignes de données pour ce vol
      integer i
      integer j
      integer k ! k est le nombre de doublons
      integer ntf ! N³ de colonne de la matrice d'entrée cont; c'est le N³ du vol.
      integer OMP_GET_THREAD_NUM
      integer TID
      real cont(31640, 326) ! matrice d'entrée, l'historique des contraintes pour chaque vol.
      real epsi
      real valinter(31640) ! vecteur colonne de sortie; historique d'un vol sans doublon.

      subroutine ValInt(Arr, Diml, ntf, valinter)
      implicit none
      integer Diml(326) ! dim en lignes pour les vols
      integer i ! indice de Arr, la sortie.
      integer j ! indice de valinter, l'entrée.
      integer ntf
      integer OMP_GET_THREAD_NUM
      integer r ! c'est le nombre de valeurs intermédiaires trouvées donc éliminées
      integer TID
      real Arr(31640) ! la colonne des valeurs de sortie (avant rainflow)
      real p
      real q
      real valinter(31640) ! la colonne des valeurs d'entrée (après doublons)

      subroutine rainflow(Arr, ntf, Diml, EID, FAT, p, q)
      implicit none
      integer A
      integer af(31640)
      integer av(31640)
      integer B
      integer C
      integer col
      integer Diml(326) ! dim en lignes pour chaque vol
      integer EID
      integer i
      integer j
      integer k
      integer m
      integer Max
      integer ntf
      integer OMP_GET_THREAD_NUM
      integer TID
      integer tmp2 !DVA: sert â compter l'avancement
      real AA
      real Arr(31640)
      real BB
      real CC
      real FAT
      real p ! inutilisé dans rfomp1 mais passé ici en argument
      real q ! défini dans rfomp1 et passé ici en argument
      real R
      real Smax
      real Smin
      real X
      real Y
dva2tlse
 
Posts: 18
Joined: Sun Sep 08, 2013 2:52 am
Location: Toulouse, France

Re: different results vs # of threads or proc's'

Postby ftinetti » Thu Oct 31, 2013 1:12 pm

Hi David,

I'll post details as soon as I find those, since the code is a little bit long for me to alanyze... Just two details right now:
1) I would not comment the OMP directives, if you do not want to run the code with OpenMP threads, just compile without the OpenMP flag... or use just one thread, which you already verified the works fine. In the first case (i.e. if you compile without the OpenMP flag) you'll need to comment the function calls related code/lines of code.
2) I would not use the $OMP MASTER ... $OMP END MASTER prior to the $OMP PARALLEL DO ... I think it does not make any sense.

HTH,

Fernando.
ftinetti
 
Posts: 582
Joined: Wed Feb 10, 2010 2:44 pm

Next

Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot] and 8 guests