[Omp] OpenMP spec 2.5 seems to have incorrect flush example on page 12

Greg Bronevetsky greg at bronevetsky.com
Sat May 5 14:45:10 PDT 2007


My understanding of what Marcel has been saying is that it applies equally
to flush with and without a list. Marcel suggests that flush in general
is a bad idea since it prevents certain sequential optimizations. Although
Marcel suggests that programmers be limited to barriers and the like, I
believe that barriers are the only synchronization construct that his
optimization is compatible with. In particular, locks, critical regions
and ordered regions don't seem to be compatible. For example, consider the
following program, which uses locks: (the loop is just like Marcel's in
that X=1 is never executed but the compiler can't know that)
(initially X=21)
Thread 0           Thread 1
--------           --------
while(?)           omp_set_lock(&l)
  if(?) X=1        X=42
                   omp_unset_lock(&l)
omp_set_lock(&l)
print X
omp_unset_lock(&l)

I think everyone can agree that the print on Thread 0 should output
42. However, Marcel's optimization can transform that above execution into
the one below:
(initially X=21)
Thread 0           Thread 1
--------           --------
r=X                omp_set_lock(&l)
while(?)           X=42
  if(?) r=1        omp_unset_lock(&l)
X=r
omp_set_lock(&l)
print X
omp_unset_lock(&l)

The result is that Thread 0 prints 21.

As such, we have a choice: This sequential optimization or all
synchronization constructs besides barrier. 

There is one other option. Instead of the above transformation, do the
following:
(initially X=21)
Thread 0           Thread 1
--------           --------
r=X                omp_set_lock(&l)
X2=X               X=42
while(?)           omp_unset_lock(&l)
  if(?) r=1        
if(X2!=X) X=r
omp_set_lock(&l)
print X
omp_unset_lock(&l)

This would preserve the desired semantics at a pretty small reduction in
sequential performance.

                             Greg Bronevetsky

On Sat, 5 May 2007, Marcel Beemster wrote:

> Larry wrote:
> > It really does seem specific to this particular optimization; I've been
> > trying to think of other cases. Is a flush required even if the
> > assignment is never executed?
> 
> While Jakub wrote:
> > There are no OpenMP directives in the
> > for( i = 0 ; i < 100 ; i++ )
> > loop, it can very well be moved to a new routine in some other compilation
> > unit, perhaps not built with OpenMP flags at all.  Are you saying that
> > because of OpenMP existence all similar loop transformations are illegal?
> 
> I side with Jakub on this. This is really not specific for
> this particular optimization. If an OMP compiler does not
> have this freedom >between flushes<, then any optimization
> involving globally visible objects (including malloced arrays
> residing in memory), also in library code that is free of
> OMP directives, has to be looked at very carefully.
> 
> As a compiler writer, I was happy to read the intentions of
> the OMP memory model at the start of 1.4, page 10. It says:
> 
>     "OpenMP provides a relaxed-consistency, shared-memory model. All
>     OpenMP threads have access to a place to store and retrieve
>     variables, called the memory. In addition, each thread
>     is allowed to have its own temporary view of the memory.
>     The temporary view of memory for each thread is not a required
>     part of the OpenMP memory model, but can represent any kind
>     of intervening structure, such as machine registers, cache,
>     or other local storage, between the thread and the memory.
>     The temporary view of memory allows the thread to cache
>     variables and thereby avoid going to memory for every reference
>     to a variable."
> 
> I interpret this as saying that an OMP compiler can be as
> aggressive in its optimizations as a compiler for sequential
> C or Fortran, between points where the application programmer
> explicitly places OMP directives. I also believe that this
> is what we all should want, because we should not start off
> our parallelization effort with a built-in disadvantage over
> sequential code.
> 
> The implication of allowing compilers to do such optimizations
> is that the page 12 communication of shared variables example
> should be removed from the OMP specification. I argue that this
> is not a great loss: we lose the ability to communicate a shared
> variable between essentially >unsynchronized< parallel threads. It
> is actually non-trivial to construct a program that performs such
> communication, see my example code. If you want to do this, use
> volatile.
> 
> In that case, the explicit and unsynchronized "#pragma omp
> flush" also becomes meaningless and must be removed from
> OMP. The flush only has use when it occurs synchronized between
> two or more threads, for example impled at a barrier. I am
> not a fan of suddenly removing well-recognized features from
> language specifications, but it really is the only logical
> outcome of wanting OMP compilers to do optimizations (in an
> equally agressive way as their sequential counterparts).
> 
> Marcel
> 
> 
> -- 
> Dr. Marcel Beemster, Senior Software Engineer, marcel at ace.nl,www.ace.nl
> Associated Compiler Experts bv. Amsterdam, Netherlands. +31 20 6646416.
> -----------------------------------------------------------------------
> This e-mail and any  files transmitted  with it are  confidential.  Any
> technical information contained herein is supplied as-is, and no rights
> can be  derived therefrom.  If you have received this message in error,
> please notify  the sender by reply  e-mail immediately,  and delete the
> message and all copies thereof.
> 
> 
> _______________________________________________
> Omp mailing list
> Omp at openmp.org
> http://openmp.org/mailman/listinfo/omp
> 



More information about the Omp mailing list