[Omp] Ask the barrier again
eduncan
Eric.Duncan at Sun.COM
Fri Mar 9 19:10:21 PST 2007
Interesting interpretation of "may not". I guess we need to change the
wording to "can not". In any case, just because it may compile does
not mean that it is valid. There is no requirement that a compiler flag
non-compliant programs in the spec. From a usability standpoint
however, it is desirable and most of the compilers that support OpenMP
will try and catch non-compliant programs (to various degrees of success).
As for your question, I am not sure I know what exactly you are asking.
The idea of the workshare is to divide the work among the threads.
However, things like omp_dynamic and the schedule for the workshare do
default to implementation defined when not specified. These might have
an effect on the timing. Cache effects might also be causing
differences, since I have no idea how these arrays are being accessed or
their size. You also say that the processors have different
frequencies. When the work is divided up, I don't believe most
implementations look at the frequencies to give differing amounts to
faster processors. As for the time spent in the barrier,
implementations may put threads to sleep or spin, waiting for all
threads to arrive.
So there are a number of factors that may come into play - depending on
the OpenMP implementation, the hardware you are running on, the layout
of the data, etc. I know I still didn't answer the question, but it
requires more information for a more complete answer.
Shengyan Hong wrote:
>Every openmp member,
> I check the spec25. It says that
>"A barrier region may not be closely nested inside a work-sharing,
>critical,ordered, or master region. " But it does not say "can not".
>My code passes the compiler on Unix. So can you give me further advice?
> I use 8 processors with different frequencies. The frequencies are
>between 1G and 1.1 G.
> I test the idle time in the barrier again and find that
>each one has 6 cycles. The execution time is different. But not too much.
>For example, 1.7821*10^5 and 1.78345*10^5. Besides, I delete the barrier
>in the code and keep the break point in the code. I find that the idle
>time keeps 6 cycles. Besides, the sum of the execution time and the idle
>time for each processor is not the same. I do not know why for these
>questions.
> I guess that I have not used the barrier correctly. How can I use
>it? Another explanation is that the task is divided quite well.
> The code is as follows:
>!$omp parallel do default(shared) private(i,j,k)
> do k = 1, d3
>C TID = OMP_GET_THREAD_NUM()
>C PRINT *, 'thread = ', TID
>C print *, "March 9"
> CALL MAGIC_BRK_SIM_START()
> do j = 1, d2
> do i = 1, d1
> u1(i,j,k) = u0(i,j,k)*ex(t*indexmap(i,j,k))
> end do
> end do
>C print *, "Before barrier"
> CALL MAGIC_BRK_SIM_MIDDLE()
>C !$OMP BARRIER
>C print *, "After barrier"
> CALL MAGIC_BRK_SIM_STOP()
> end do
>
>
> Shengyan Hong
>
>_______________________________________________
>Omp mailing list
>Omp at openmp.org
>http://openmp.org/mailman/listinfo/omp
>
>
More information about the Omp
mailing list