[Omp] OpenMP spec 2.5 seems to have incorrect flush example on page12

James Beyer beyerj at cray.com
Fri May 4 06:57:19 PDT 2007


Marcel,

Your test case works fine using the Cray compiler.  The flushes cause "sharedVar" to be flushed to memory where expected and reloaded from memory into temps where expected.  It would appear that gcc is not properly honoring the flush directive.

Just to reiterate, there is nothing wrong with the spec.  Proper implementations of flush will always flush the register value to memory and reload the memory value into a register.

james

-----Original Message-----
From: omp-bounces at openmp.org [mailto:omp-bounces at openmp.org] On Behalf Of Marcel Beemster
Sent: Friday, May 04, 2007 7:21 AM
To: omp at openmp.org
Subject: [Omp] OpenMP spec 2.5 seems to have incorrect flush example on page12

Dear OpenMP specification gurus,

Thanks to the people who responded to my questions of
yesterday. It is clear to me now that there is something
wrong with the OMP specification. See and run the example
program below.

The consensus seems to be:
	1) The compiler can optimize as much as possible
	   between flushes.

	   	- This is what I would like to see too

	2) My program can be "fixed" by adding more flushes
	   and synchronization to prevent compiler optimizations.

	   	- I know how to fix my program, but the OMP 2.5
		  specification says on page 12 that the exact
		  sequence specified there is sufficient to
		  communicate a shared variable between two
		  threads. But that is not sufficient, given the
		  possible compiler optimizations that we all
		  want.

My conclusion is that the description of communication of a
shared variable using the event sequence on page 12 of the
OMP specification is incorrect.

To further clarify, see below a self-checking stand-alone
program that exactly implements the sequence of events
described on page 12.  Additionally, the program is construed
in such a way that the compiler possibly caches the shared
value across the flush by T1. Note that T2 never writes to
sharedVar, though the compiler cannot see that.

Compiling and running the program without optimization gives:

	$ export OMP_NUM_THREADS=2
	$ gcc -fopenmp exampleP12.c && ./a.out
	exampleP12.c: In function 'main':
	exampleP12.c:65: warning: division by zero
	Start of Program
	0: T2 sees value 21 in variable sharedVar
	0: T2 has flushed variable sharedVar
	1: T1 writes 42 in variable sharedVar
	2: T1 has flushed variable sharedVar
	3: T2 flushes variable sharedVar
	4: T2 reads from variable sharedVar
	   T2 has value 42 from sharedVar
	   That is the right value
	End of Program

Compiling and running the program with optimization gives:

	$ export OMP_NUM_THREADS=2
	$ gcc -fopenmp -O3 exampleP12.c && ./a.out
	[...]
	4: T2 reads from variable sharedVar
	   T2 has value 21 from sharedVar
	   The value 21 is incorrect
	End of Program

These results are for gcc 4.2 on i686/linux:
	$ gcc -v
	Using built-in specs.
	Target: i686-pc-linux-gnu
	Configured with: ../configure
	--prefix=/home/marcel/tmp/gcc-4.2-bin
	Thread model: posix
	gcc version 4.2.0 20070214 (prerelease)

My questions:

	1) Please run this program with your non-gcc compiler
	   and report the results, compiler and platform. This
	   will give an idea of how other implementations
	   view optimization. Make sure to run with at least
	   two threads: export OMP_NUM_THREADS=2, otherwise
	   the program will hang.

	2) Do you agree that the program is a correct
	   implementation of the events described on page 12 of
	   the OMP specification?

	3) Do you agree that that sequence of events is hence
	   not sufficient to communicate a value between
	   threads because we do want the compiler to optimize,
	   hence the OMP specification is incorrect on this point?

Thanks a lot,
	Marcel

===================================================================


#include <stdio.h>

/*
  * Lightweight but inefficient sequencing mechanism between two running
  * threads. Does not require OS or omp-library calls.
  * Make sure that at least two threads exist by setting the environment
  * variable OMP_NUM_THREADS=2.
  */
volatile int nextStep = 0 ;
#define DONEXTSTEP(  x )    nextStep = x ;
#define WAITFORSTEP( x )    while( nextStep != x ) { /*nothing */ }

/* The shared variable written by T1 and read by T2 */
int sharedVar = 21 ;

/* Dynamic variable so compiler cannot derive its value */
int cutOff = 500 ;

int main( void ) {

     printf( "Start of Program\n" ) ;

#pragma omp parallel sections
     {

#pragma omp section
         {   /* Start of T1 */

             WAITFORSTEP( 10 ) ; /* Wait until T2 says to continue */
             sharedVar = 42 ;
             printf( "1: T1 writes 42 in variable sharedVar\n" ) ;

             #pragma omp flush( sharedVar )
             printf( "2: T1 has flushed variable sharedVar\n" ) ;
             DONEXTSTEP( 30 ) ;  /* Tell T2 that write&flush is done */

         }   /* End of thread T1 */

#pragma omp section
         {   /* Start of T2 */
             int i, locVar ;

             printf( "0: T2 sees value %d in variable sharedVar\n", 
sharedVar ) ;
             #pragma omp flush( sharedVar )
             printf( "0: T2 has flushed variable sharedVar\n" ) ;

                         /* If the compiler decides to optimize and cache
                          * the value of sharedVar in a register, this 
is the
                          * place, before the loop, where sharedVar is 
read */
             for( i = 0 ; i < 100 ; i++ ) {
                 if( i == 0 ) {
                         /* After potentially caching sharedVar, tell T1
                          * to continue with its write&flush */
                     DONEXTSTEP( 10 ) ;
                 }
                 if( i == 99 ) {
                         /* Before exiting the loop and writing back the
                          * cached but not written value, wait until T1
                          * finished its flush of sharedVar */
                     WAITFORSTEP( 30 ) ;
                 }
                 if( i > cutOff ) {
                         /* This code is never executed but the
                          * compiler cannot know that */
                     sharedVar = i / 0 ;
                 }
                         /* After the loop, the potentially cached value
                          * of sharedVar is written to memory */
             }

             #pragma omp flush( sharedVar )
             printf( "3: T2 flushes variable sharedVar\n" ) ;

             locVar = sharedVar ;
             printf( "4: T2 reads from variable sharedVar\n" ) ;
             printf( "   T2 has value %d from sharedVar\n", locVar ) ;
             if( locVar == 42 ) {
                 printf( "   That is the right value\n" ) ;
             } else {
                 printf( "   The value %d is incorrect\n", locVar ) ;
             }

         } /* End of thread T2 */

     }   /* End of parallel construct */
     printf( "End of Program\n" ) ;
     return 0 ;
}

-- 
Dr. Marcel Beemster, Senior Software Engineer, marcel at ace.nl,www.ace.nl
Associated Compiler Experts bv. Amsterdam, Netherlands. +31 20 6646416.
-----------------------------------------------------------------------
This e-mail and any  files transmitted  with it are  confidential.  Any
technical information contained herein is supplied as-is, and no rights
can be  derived therefrom.  If you have received this message in error,
please notify  the sender by reply  e-mail immediately,  and delete the
message and all copies thereof.


_______________________________________________
Omp mailing list
Omp at openmp.org
http://openmp.org/mailman/listinfo/omp


More information about the Omp mailing list