[Omp] Ask the barrier again

eduncan Eric.Duncan at Sun.COM
Fri Mar 9 19:10:21 PST 2007


Interesting interpretation of "may not".  I guess we need to change the 
wording to "can not".  In any case,  just because it may compile does 
not mean that it is valid.  There is no requirement that a compiler flag 
non-compliant programs in the spec.  From a usability standpoint 
however, it is desirable and most of the compilers that support OpenMP 
will try and catch non-compliant programs (to various degrees of success).

As for your question, I am not sure I know what exactly you are asking.  
The idea of the workshare is to divide the work among the threads.  
However, things like omp_dynamic and the schedule for the workshare do 
default to implementation defined when not specified.  These might have 
an effect on the timing.  Cache effects might also be causing 
differences, since I have no idea how these arrays are being accessed or 
their size.  You also say that the processors have different 
frequencies.  When the work is divided up, I don't believe most 
implementations look at the frequencies to give differing amounts to 
faster processors.  As for the time spent in the barrier, 
implementations may put threads to sleep or spin, waiting for all 
threads to arrive.

So there are a number of factors that may come into play - depending on 
the OpenMP implementation, the hardware you are running on, the layout 
of the data, etc.  I know I still didn't answer the question, but it 
requires more information for a more complete answer.

Shengyan Hong wrote:

>Every openmp member,
>       I check the spec25. It says that 
>"A barrier region may not be closely nested inside a work-sharing, 
>critical,ordered, or master region. " But it does not say "can not". 
>My code passes the compiler on Unix. So can you give me further advice?
>       I use 8 processors with different frequencies. The frequencies are 
>between  1G and 1.1 G.
>       I test the idle time in the barrier again and find that 
>each one has 6 cycles. The execution time is different. But not too much. 
>For example, 1.7821*10^5 and 1.78345*10^5. Besides, I delete the barrier 
>in the code and keep the break point in the code.  I find that the idle 
>time keeps 6 cycles. Besides, the sum of the execution time and the idle 
>time for each processor is not the same. I do not know why for these 
>questions.
>       I guess that I have not used the barrier correctly. How can I use 
>it? Another explanation is that the task is divided quite well.
>       The code is as follows:
>!$omp parallel do default(shared) private(i,j,k)
>       do k = 1, d3
>C       TID = OMP_GET_THREAD_NUM()
>C       PRINT *, 'thread = ', TID
>C       print *, "March 9"
>        CALL MAGIC_BRK_SIM_START()
>          do j = 1, d2
>             do i = 1, d1
>                u1(i,j,k) = u0(i,j,k)*ex(t*indexmap(i,j,k))
>             end do
>          end do
>C       print *, "Before barrier"
>        CALL MAGIC_BRK_SIM_MIDDLE()
>C       !$OMP BARRIER
>C       print *, "After barrier"
>        CALL MAGIC_BRK_SIM_STOP()
>        end do
>
>
>                                              Shengyan Hong
>
>_______________________________________________
>Omp mailing list
>Omp at openmp.org
>http://openmp.org/mailman/listinfo/omp
>  
>




More information about the Omp mailing list