[Omp] Shared Variable Problem
Greg Bronevetsky
greg at bronevetsky.com
Tue Aug 1 11:52:04 PDT 2006
There is a complex trade-off to make here. From the OpenMP implementors'
point of view, the larger the unit of atomicity that is guaranteed, the
more constrained their designs must be in order to be compliant. While
something like 32-bit atomicity should be practical to implement it in
most current and recent designs, it is unknown where designs will go in
the future, making such commitments dangerous.
From the users' point of view, the best outcome would be to make the =
operator atomic. This is simple and easy to understand at the
application level. However, it is also impractical since the variables
being assigned can be gigantic (structs) or mis-aligned. As such, some
annoying low-level restrictions such as x-bit atomicity limits and
alignment constraints need to be made in order to make things practical.
The problem with this is that OpenMP is defined in terms of high-level
C/C++ and Fortran, not at machine-level. (Granted, the languages
themselves can include machine-level details but OpenMP doesn't go this
low) Given that OpenMP is supposed to be simple, specifying atomicity at
the level of bits and alignment is a violation of OpenMP's intent and
makes it difficult for the user to determine whether a given assignment
is atomic or not.
Here's a possible compromise solution. The main thing to note is
something realized by most memory models: most memory accesses don't
care about flushing or atomicity or anything like that. As such, we can
add a new directive to OpenMP that identifies certain assignments as
atomic and leaves the rest alone. OpenMP can provide atomicity in two
different ways.
Option 1: x-bit atomicity. In this case the compiler must guarantee
that any variable involved in an atomic assignment is properly aligned
and isn't larger than x-bits. If any larger variable is used in the
atomic assignment , the compiler would throw a warning, informing the
user of how much atomicity they would actually get in that write. If the
underlying hardware provides y-bit atomicity, where y<x, then the OpenMP
implementation must do something fancy to implement correct semantics.
Option 2: full atomicity. Assume that the underlying hardware
provides y-bit atomicity. All atomic assignments that will work with
y-bit atomicity use it directly. All writes that need larger atomicity
use a slower protocol that is at least correct and the compiler throws a
notification to the user telling them that they're doing something that
is sub-optimal on this platform.
--
Greg Bronevetsky
Meadows, Lawrence F wrote:
> I suppose that is a point.
>
> What I'm getting at is that it would be nice to have some sort
> of least-common-denominator hardware atomicity so that OpenMP
> programs on most platforms could roll their own synchronization.
>
> Or you could take my latest stance which is that you shouldn't
> do that anyway and flush is an abomination.
>
> But at least Greg could augment his memory model with assertions
> that would hold in the presence of certain hardware atomicity
> guarantees.
>
> Larry
>
>
>> -----Original Message-----
>> From: Pieper, John
>> Sent: Tuesday, August 01, 2006 11:12 AM
>> To: Meadows, Lawrence F; Greg Bronevetsky; Bronis R. de Supinski
>> Cc: omp at openmp.org
>> Subject: RE: [Omp] Shared Variable Problem
>>
>> Can we assume you mean aligned addresses?
>>
>>
>>> -----Original Message-----
>>> From: omp-bounces at openmp.org [mailto:omp-bounces at openmp.org]
>>> On Behalf Of Meadows, Lawrence F
>>> Sent: Tuesday, August 01, 2006 11:21 AM
>>> To: Greg Bronevetsky; Bronis R. de Supinski
>>> Cc: omp at openmp.org
>>> Subject: RE: [Omp] Shared Variable Problem
>>>
>>> Is anyone aware of any implementation of OpenMP on a platform
>>> that does not guarantee atomicity on at least 4-byte reads/writes?
>>>
>>>
>>>> -----Original Message-----
>>>> From: omp-bounces at openmp.org [mailto:omp-bounces at openmp.org]
>>>> On Behalf Of Greg Bronevetsky
>>>> Sent: Sunday, July 30, 2006 6:38 PM
>>>> To: Bronis R. de Supinski
>>>> Cc: omp at openmp.org
>>>> Subject: Re: [Omp] Shared Variable Problem
>>>>
>>>> One simple way of thinking about why this code is wrong it
>>>> that OpenMP's
>>>> memory model guarantees only one thing. The only ways for a read of a
>>>> variable on thread A to be guaranteed to return the value written by
>>>> thread B is if 1. A=B and the write precedes the read or 2.
>>>> the operations
>>>> are executed in the following sequence:
>>>> Thread A Thread B
>>>> -------- --------
>>>> write var, 1
>>>> flush(var)
>>>> ... some time later ...
>>>> flush(var)
>>>> read var
>>>> The value returned by the read on thread B is undefined in any
>>>> other case.
>>>> Note the flushes and the time gap between the events on thread A and
>>>> events by thread B.
>>>>
>>>> The problem with the code example below is that it can have
>>>> the following
>>>> interleaving:
>>>> Thread A Thread B
>>>> -------- --------
>>>> i++ flush(i)
>>>> flush(i) read i
>>>> Because the read on thread B is not temporally separated from
>>>> and flushed
>>>> relative to the write on thread A, OpenMP provides no
>>>> guarantees about the
>>>> value that it returns. This applies to any kind of synchronization
>>>> implemented using reads and writes. If there is any
>>>>
>> possibility of the
>>
>>>> read and the write racing each other (as is the case in almost any
>>>> synchronization algorithm), then the read can produce
>>>>
>>> undefined values.
>>>
>>>> In general, if you're trying to write a synchronization
>>>>
>>> algorithm in
>>>
>>>> OpenMP and you want to test it for correctness, here's a
>>>>
>> basic rule of
>>
>>>> thumb:
>>>> If your synchronization code cares about the outcomes of your
>>>> synchronization reads then its probably wrong. If it only
>>>> reads values
>>>> to test whether a given variable has been written to, then (with
>>>> appropriate flushes) it may be correct.
>>>>
>>>> The reason why this is so is because OpenMP provides no atomicity
>>>> guarantees for writes. Thus, the write in "i++" can be
>>>>
>>> implemented as a
>>>
>>>> single 64-bit write, as 64 separate 1-bit writes or something more
>>>> complex. As such, the read on thread B can read a value mid-update,
>>>> causing strange behavior. The reason for why OpenMP works
>>>>
>> this way (as
>>
>>>> I've gathered from talking to people) is because its design
>>>>
>> is heavily
>>
>>>> influenced by OpenMP implementors who don't want to be too
>>>>
>> constrained
>>
>>>> about how much atomicity to provide. Afterall, OpenMP may end
>>>> up running
>>>> on embedded platforms that may only provide 4-bit wide buses.
>>>> Furthermore,
>>>> if people do agree that x-bit writes are atomic, what about
>>>>
>> code like:
>>
>>>> struct foo var1, var2;
>>>> var1 = var2;
>>>> var1 and var2 can be arbitrarily large, meaning that some
>>>>
>>> writes in the
>>>
>>>> application are atomic while others are not, complicating
>>>> things for the
>>>> user. For the time being, OpenMP provides no atomicity
>>>> guarantees, which
>>>> makes most synchronization algorithms illegal.
>>>>
>>>> Greg Bronevetsky
>>>>
>>>> On Fri, 28 Jul 2006, Bronis R. de Supinski wrote:
>>>>
>>>>
>>>>> Yuan:
>>>>>
>>>>> Infortunately, your fix is still broken. The accesses
>>>>> to i within the parallel region are unsynchronized.
>>>>> Since the increment operator implies a write to i,
>>>>> there is race between the increments in tid != 0
>>>>> threads and the reads of i in the printf statements.
>>>>> There also a race between the different increments.
>>>>> As a result the value of i will generally be undefined.
>>>>>
>>>>> In the least, you would want the increments to be
>>>>> enclosed in an atomic pragma. However, that will
>>>>> still not fix the race with the reads.
>>>>>
>>>>> A more detailed discussion of how to fix this example
>>>>> is beyond the tiime I have to spend on it since I am
>>>>> on vacation. Perhaps someone else can chime in...
>>>>>
>>>>> Bronis
>>>>>
>>>>> On Fri, 28 Jul 2006, Yuan Lin wrote:
>>>>>
>>>>>
>>>>>> The value of a shared variable updated by one thread is
>>>>>>
>>>> not guaranteed
>>>>
>>>>>> to be seen by another thread unless flush (implicit or
>>>>>>
>>> explicit) is
>>>
>>>>>> used. The memory model is described in Spec 2.5. In your
>>>>>>
>>>> sample, the
>>>>
>>>>>> value of i may always be read from a register.
>>>>>>
>>>>>> Try the following code and see if it works as you wanted.
>>>>>>
>>>>>> i = 0;
>>>>>>
>>>>>> #pragma omp parallel private(tid)
>>>>>> {
>>>>>> tid = omp_get_thread_num();
>>>>>> while(1)
>>>>>> {
>>>>>> if (!tid)
>>>>>> {
>>>>>> #pragma omp flush(i)
>>>>>> printf("master: i = %d\n", i);
>>>>>> fflush(stdout);
>>>>>> }
>>>>>> else
>>>>>> {
>>>>>> printf("worker: i = %d\n", i);
>>>>>> fflush(stdout);
>>>>>> i ++;
>>>>>> #pragma omp flush(i)
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> Also depending on how the threads are scheduled, you may
>>>>>>
>>>> see thousands
>>>>
>>>>>> of master: i=0 before you see worker: i=1, 2, 3, ....,
>>>>>>
>>>> then you may see
>>>>
>>>>>> master: i= some big number. Do not expect to see the value
>>>>>>
>>>> of i printed
>>>>
>>>>>> from the master be to consecutive. There is no
>>>>>>
>>>> synchronization between
>>>>
>>>>>> the master thread and the slave thread in your code.
>>>>>>
>>>>>> Hope it helps.
>>>>>>
>>>>>> -- Yuan
>>>>>>
>>>>>> ----------------------------------------
>>>>>> http://blogs.sun.com/roller/page/yuanlin
>>>>>> ----------------------------------------
>>>>>>
>>>>>>
>>>>>> KaveH Aasaraai wrote:
>>>>>>
>>>>>>> Hey guys,
>>>>>>>
>>>>>>> I'm new to this OpenMP thing. I'm facing this problem
>>>>>>> of using a shared value, and trying to synchronize two
>>>>>>> threads with this shared variable. Any help would
>>>>>>> really be appreciated. Here is the problem:
>>>>>>>
>>>>>>> i = 0;
>>>>>>>
>>>>>>> #pragma omp parallel private(tid)
>>>>>>> tid = omp_get_thread_num();
>>>>>>> while(1)
>>>>>>> {
>>>>>>> if (!tid)
>>>>>>> {
>>>>>>> printf("master: i = %d\n", i);
>>>>>>> fflush(stdout);
>>>>>>> }
>>>>>>> else
>>>>>>> {
>>>>>>> printf("worker: i = %d\n", i);
>>>>>>> fflush(stdout);
>>>>>>> i ++;
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> The "0" thread doesn't see the changes to the shared
>>>>>>> variable, i. If I remove the while(1) thing, then the
>>>>>>> value printed for "i" is correct. Here is a sample
>>>>>>> (partial) output:
>>>>>>>
>>>>>>>
>>>>>>> master: i = 0
>>>>>>> master: i = 0
>>>>>>> master: i = 0
>>>>>>> worker: i = 2445
>>>>>>> worker: i = 2446
>>>>>>> worker: i = 2447
>>>>>>> worker: i = 2448
>>>>>>> worker: i = 2449
>>>>>>> worker: i = 2450
>>>>>>> worker: i = 2451
>>>>>>> worker: i = 2452
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks to you guys in advance,
>>>>>>>
>>>>>>> Kaveh
>>>>>>>
>>>>>>> __________________________________________________
>>>>>>> Do You Yahoo!?
>>>>>>> Tired of spam? Yahoo! Mail has the best spam protection around
>>>>>>> http://mail.yahoo.com
>>>>>>> _______________________________________________
>>>>>>> Omp mailing list
>>>>>>> Omp at openmp.org
>>>>>>> http://openmp.org/mailman/listinfo/omp
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Omp mailing list
>>>>>> Omp at openmp.org
>>>>>> http://openmp.org/mailman/listinfo/omp
>>>>>>
>>>>>>
>>>> _______________________________________________
>>>> Omp mailing list
>>>> Omp at openmp.org
>>>> http://openmp.org/mailman/listinfo/omp
>>>>
>>>>
>>> _______________________________________________
>>> Omp mailing list
>>> Omp at openmp.org
>>> http://openmp.org/mailman/listinfo/omp
>>>
>>>
>
>
More information about the Omp
mailing list