Memory model

General OpenMP discussion

Memory model

Postby hans.boehm@hp.com » Wed Dec 05, 2007 6:43 pm

Based on an incomplete reading, it appears to me that the OpenMP memory model is completely inconsistent with the proposed C++ one (and pthreads) in several ways. This worries me since I expect that in many cases similar compielr back-ends will be used. Issues: (C++ refers to the C++ working paper N2461, not an approved standard.)

- Semantics of data races. C++ carefully defines what constitutes a data race (nothing here is implementation defined anymore). Programs with data races have undefined semantics. A rationale for this is given in http://www.open-std.org/jtc1/sc22/wg21/ ... n2176.html . Under some easily detectable restrictions, other programs have sequentially consistent semantics. This is consistent with pthreads. OpenMP gives some rules in 1.4.1 which are very unclear to me. It talks about an implementation-defined minimum size for atomicity (page 14, line 4). There is presumably also a maximum size, since some architectures cannot store or load say, 64-bit integers, atomically. In any case, it does not disallow data races, or provide a way to identify them to the compiler. See N2176 for a reason to do so. (An unspecified value is very different from undefined behavior here. Volatile won't work to identify races. See separate posting. C++ uses atomics to identify races. Simultaneous access to atomics is allowed, but is not defined as a data race.)

- 1.4.1 appears to allow updates of small variables (page 14, line 4) to rewrite adjacent memory in implementation-defined ways? (In what other sense might SMALL updates not be atomic?) Given that compilers are often allowed to reorder variables, and linkers often do, I think this is unusable in portable code, and needs to be pinned down at a minimum. (C++ limits it to contiguous bit-fields. Java disallows it. It doesn't have bit-fields.)

- I think that as it stands the set_lock and unset_lock primitives require a full fence since they're specified that way, and since races do not result in undefined behavior, this is detectable. On some architectures that would require an appreciably slower implementation than is customary, e.g. for pthreads.

- There are other issues with volatiles, discussed in a separate message.

Hans
hans.boehm@hp.com
 

Re: Memory model

Postby BronisDeSupinski » Thu Dec 06, 2007 11:08 am

Hans:

Regarding your posting about the OpenMP memory model, the intent of the model is to make the effect of all data races unspecified, which is the OpenMP equivalent of undefined. Specifically, on page 14, lines 8 to 14, it is stated that write/write and read/write races result have unspecified results.

I understand the concern about not stating a portable level at which atomicity is guaranteed. This issue has been very contentious within the OpenMP language committee. You are correct that the failure to specify it has an impact on the portability of OpenMP code. The problem is that implementers do not want to be required to guarantee atomicity for bit-level or other small accesses below the hardware atomicity guarantees of the machine. Specifically, consider an OpenMP parallel for loop in which a char array is updated; the accesses to the indivudal elements would be executed in parallel; must this work correctly? Not under the current wording. Since, this would often be implemented as word reads, byte updates and word writes, a minimum chunk size because multiple chars, which must be properly aligned. Or, the implementation must insert locking to ensure atomicity.

What would you suggest? I think we could get the implementers to agree that the minimum must not be larger than that required by the base language. My experience in discussing this issue is that more than that will be very difficult...

Bronis
BronisDeSupinski
 

Re: Memory model

Postby Guest » Sat Dec 29, 2007 10:20 am

[ I noticed after the fact that I probably posted this to the wrong forum. My apologies.]

Regarding undefined semantics for data races:

I may well be misreading the spec (which would be good). To be concrete, if I write:

{
int i = x; // x global, shared, i local

if (i > 2) foo();
... // doesn't touch i
if (i > 2) bar();
}

In the event of a race, is it possible that exactly one of foo() and bar() will be called, i.e. that the results of "i > 2" will be inconsistent?

The C++ WP says "yes". Posix implicitly says "yes". I have been told by a gcc implementor that it could conceivably result in answer of "yes", though that's very hard to get in practice. (The issue is that if i gets spilled, the compiler may legitinately reload it from x, rather than storing it to the stack, since the compiler "knows" that i and x are copies.)

I read the OpenMP spec as saying this must not happen. I'd be happy to have been wrong on this one.

As far as atomicity guarantees: I know of no current machines that do not provide atomic byte loads and stores that do not touch surrounding memory. The first generation of Alphas were the canonical exception. They're no longer very interesting. If there is such a multiprocessor machine, it can't run Java very well. I expect it will not be able to run the next C++. For uniprocessors there are workarounds in any case. I suspect that most multiprocessor vendors are much more concerned with programmability than backwards compatibility with ancient history.

Yes, char arrays must work correctly. I don't think any parallel programming language is viable without that guarantee. The Java memory model effort made that decision early on with a minimum of dissent. I still haven't heard of problems in that area.

As you can tell, I'm all infavor of reopening this issue. In my mind, this is a showstopper for usability. And, incontrast to widely held beliefs, I don't think there is a serious implementation issue any more.

Data objects smaller than a byte, or not byte-aligned, are a different issue. The C++ working paper allows a store to a bit-field to rewrite any bit-fields in a contiguous sequence of (nonzero-length) bit-fields. I think that's the right definition. It is more stringent than what current compiler back-ends implement. But I think there are strong argument to fix that. And current measurements (on SPEC :-( ) indicate the cost is minor.

This is also different from the issue with larger misaligned (but byte-aligned) data, or data larger than what's atomically updatable. I think that is real on some platforms, and is yet another reason that data races should result in completely undefined semantics. You cannot always be guaranteed to see either the new or the old value.

Hans


Last bumped by Anonymous on Sat Dec 29, 2007 10:20 am.
Guest
 


Return to Using OpenMP

Who is online

Users browsing this forum: Alexa [Bot], Exabot [Bot] and 13 guests