The thing to pay attention to when thinking about these examples is the following text from Section 1.4.2:
The flush operation provides a guarantee of consistency between a thread’s temporary
view and memory. Therefore, the flush operation can be used to guarantee that a value
written to a variable by one thread may be read by a second thread. To accomplish this,
the programmer must ensure that the second thread has not written to the variable since
its last flush of the variable, and that the following sequence of events happens in the
1. The value is written to the variable by the first thread.
2. The variable is flushed by the first thread.
3. The variable is flushed by the second thread.
4. The value is read from the variable by the second thread.
What you have to remember is that the above sequence is the only way to assure that OpenMP returns something reasonable in response to a read. More relaxed sequences, may result in literally any return value. This is where examples 2.2 and 2.3 run into trouble.
The problem with this example can be seen in the following execution. Threads 1 and 2 execute their initial flushes before the atomic update on thread 0. Threads 1 and 2 then both execute the read of flag in the while loop. Since these reads are not properly ordered relative to thread 0's write via a write-flush-flush-read ordering, they can return any value. In this case they return large values that allow both threads to immediately exit their loops. At this point the printf calls on the two threads may execute in either order. The problem was that the spin-locks used in this algorithm relied on the ability to accurately read the value of flag. In fact, in OpenMP spin-locks cannot reliably do that. The only thing that is possible is a binary test of whether the value of flag has changed. Anything else will cause unexpected results.
- Code: Select all
Thread 0 Thread 1 Thread 2
-------- -------- --------
The problem is highlighted by the execution below. Thread 0 executes the write to data, the first flush and the write to flush but stalls for some reason before executing its final flush operation. Meanwhile, thread 1 executes its first flush and begins the while loop. In the first iteration it reads flag as >=1 because of the race with thread 0's write to flag, exits the loop and tries to read flag and data. Unfortunatley, the execution has not established the proper write-flush-flush-read sequence for either variable, with data missing a flush on thread 1 and flag missing both flushes. The problem here is that although this example overcame the spin-lock problem in example 2.3, it did not place a flush between the end of the spin-lock and the subsequent code.
Thread 1 then executes another flush and tries to read data and flag again. This time, there is a guaranteed write-flush-flush-read ordering between thread 0's write to data and thread 1's read, meaning that the read is guaranteed to return 42. However, thread 1's read of flag is still undefined because of the missing flush of flag on thread 0. The problem here is that the fact that thread 1's spin-lock loop returned successfully doesn't mean that thread 0 is actually finished flushing the flag. As such, the value of flag remained undefined until additional synchronizations are performed using other flag variables that prove that thread 0 has in fact executed the flush(flag) that follows its write to flag.
- Code: Select all
Thread 0 Thread 1
flag=1 <----| Race
|-------> while(flag < 1) [flag read as >=1 because of the race]
printf("...") reads of flag and data undefined because
read of data is defined because proper
write-flush-flush-write order has been established
read of flag still undefined because thread 0's
post-write flush has not yet executed