[Omp] Two simpler examples (Re: OpenMP spec 2.5 seems to have incorrect flush example on page 12)

Marcel Beemster marcel at ace.nl
Sun May 6 10:48:46 PDT 2007


Greg wrote:
>    Actually, that's a good way to summarize this optimization: the
> compiler added a race. If the if(x)... code did execute, then the
> compiler-generated race wouldn't matter since there would be an
> application race to hide it. However, if if(x)... never executes, the
> compiler-generated race just sits there exposed, visibly screwing things
> up. The question that's not clear to me is whether possible races that end
> up not actually happening at runtime should cause the result to be
> undefined. If yes, then all is simple for implementors and the spec
> (though worse for users). If not, we should be able to make a fairly
> precise description of the kinds of optimizations that are not allowed.
> 
>    So which is it? I'm leaning toward no, simply because the yes
> option will really confuse users.

I think this summary, and the choice to be made, is
accurately described by Greg. Let me argue in favour of YES.
Summarizing more:

YES: a potential race, even if not executed, counts as a race
	and renders the program result unspecified.
NO: compilers are limited in their optimizations of shared
	variables, also between synchronization points. 

In the Specification, "NO" could be described as:
	RULE A: An implementation [compiler] of OpenMP is not
	allowed to introduce  a write to a shared variable,
	for example as a result of ending a temporary view
	of that variable, if there exists a control flow path
	that includes the new write but that does not contain
	a write to that variable in the original program code.

Maybe additional text is needed to address this between
synchronization points/flushes.

RULE A makes it dangerous to link OpenMP compiled code
to libraries that are not OpenMP compiled. That is a nuisance to
start with.

More importantly, the consequence of RULE A is that compilers
for OpenMP cannot do the optimizations on global variables that
sequential compilers can do. The specification is also not
very strong on describing how heap allocated/pointer-based
shared memory is handled. I'm afraid RULE A needs to be
extended also to deal with heap/pointer-based shared memory,
in which case you are going to hit a further penalty over
sequentially compiled code.


To deal with the YES case, we need at least two additional
rules to the Spec. The first replaces the current description of
race conditions being unspecified. The current text is:

	"If multiple threads write to the same shared variable
	without synchronization, the resulting value of the
	variable in memory is unspecified."

It needs to be extended as follows:

	RULE X: If in each of at least two threads there exists
	a control flow paths that writes to the same shared
	variable without synchronization, the resulting value of
	the variable in memory is unspecified.

This would make my sample program non-conforming because it
contains a control flow path that writes to X in both T1 and T2.
I may not agree with you that RULE A is easier for our users to
understand than RULE X. Following Lawrence' thinking, the user is
pretty much on his own anyway when communicating with shared
variables.

RULE X allows the OpenMP compiler to create any temporary view of
variables over any flow of control between synchronizations,
just like sequential compilers.

It also allows me to retract my statement that asynchronous
flushes in OMP should go.

We also need to address Yaun's "scary" Example 2, in which a
compiler introduces an arbitrary "r=S;S=r" in code that had
nu use of S at all, which is a legal transformation in
sequential programs.

	RULE Y: An implementation [compiler] is not allowed to
	introduce a write between synchronizations points if
	there does not exist at least one control flow path
	between those synchronization points that also does a
	write in the original program.

In general, this restriction is not too hard on compilers. There
are no optimizations that require to read/write variables that
the original program did not read/write.

However it may be a problem for the hardware and, it does
touch on the earlier discussion about the granularity of read
and writes (that you will all fondly remember). This one:
	http://openmp.org/pipermail/omp/2007/000681.html

If two shared variables S1 and S2 are byte sized, and allocated
next to each other in the same 32-bit word and the architecture
cannot read/write less than a full 32-bit word at a time,
things go wrong again, because there is no way to avoid
writing/updating S2 when just S1 needs to be written.

To handle this, the Spec should allow an implementation to
define a minimum size of atomic memory access, below which a
program becomes non-conforming if it relies on shared memory
communication of a variable less than that size. (Talking about
scary: I have seen cache architectures in which the minimum
write size was a full cache-line, many more than 32 bits.)

Will check Hans Boehm's paper later today after Ulrich's
recommendation.

Marcel

-- 
Dr. Marcel Beemster, Senior Software Engineer, marcel at ace.nl,www.ace.nl
Associated Compiler Experts bv. Amsterdam, Netherlands. +31 20 6646416.
-----------------------------------------------------------------------
This e-mail and any  files transmitted  with it are  confidential.  Any
technical information contained herein is supplied as-is, and no rights
can be  derived therefrom.  If you have received this message in error,
please notify  the sender by reply  e-mail immediately,  and delete the
message and all copies thereof.




More information about the Omp mailing list