[Omp] A question about OpenMP 2.5
Dieter an Mey
anmey at rz.rwth-aachen.de
Thu Mar 22 10:06:36 PDT 2007
Me feeling is that the (OpenMP) compilers we are looking at don't really
have any problem if they follow the language specifications.
I never used an Alpha processor.
But imagine you want to access 4-byte data which are not aligned to a 4
byte boundary, then you will probably run into such a problem, if the
processor is only able to load and store 4 byte aligned data.
Thinking about Fortran, I would guess that an OpenMP compiler for such a
processor will not support any 2-byte datatypes at the same time.
Because otherwise you could by bad programming style (common,
equivalence) force 4-byte data on 2-byte boundaries and run into that
problem.
regards,
Dieter
Haab, Grant schrieb:
> Dieter,
>
> The problem Greg is describing is not data alignment at all, but instead
> what minimum data size can be used so that loads and stores are
> performed atomically by the processor and memory system hardware. Most
> processors support byte-sized atomicity for regular loads and store, but
> several have pointed out that the Alpha processors supported a minimum
> of 4-byte atomicity.
>
> I know of no general-purpose processor that supports less than
> byte-granularity loads and stores, because a byte is the minimum
> addressable unit for most processors. (I'm sure somebody will find a
> counterexample though ;-)
>
> I don't believe the compiler can easily fix this problem because C and
> Fortran don't allow you to pad array elements to the minimum atomic
> load/store size. That would break unions, equvialence and the like, not
> to mention make users very irate that their character array now takes 4
> times more space!
>
> - Grant
>
>
>
> -----Original Message-----
> From: omp-bounces at openmp.org [mailto:omp-bounces at openmp.org] On Behalf
> Of Dieter an Mey
> Sent: Thursday, March 22, 2007 4:51 AM
> To: Greg Bronevetsky
> Cc: omp at openmp.org
> Subject: Re: [Omp] A question about OpenMP 2.5
>
> I see what you say.
> As a user I would expect that the compiler takes care of proper
> alignment etc. to avoid these "false sharing" effects which could lead
> to a data race.
>
> I wonder in how far this can really cause any problems on the current
> hardware and how this has been taken care of by the current OpenMP
> compliant compilers.
>
> I assume that the compiler has to guarantee and in many or all cases can
>
> guarantee that elements can be aligned such that each two elements of a
> structure, class etc. can be load and stored.
>
> In Fortran you can try to force bad alignment by common blocks or
> equivalence, which would be be programming practice anyway.
> I tried to create such a bad case, but I was not "successful" yet.
>
> I don't know in how far C/C++ programmers can do this (unions or so?)
>
> The question is, can primitive datatypes be forced to be so badly
> aligned that the compiler cannot generate single load/store instructions
>
> for those data elements.
>
> regards
> Dieter
>
> Greg Bronevetsky schrieb:
>> The difference is evoked by the following example. Suppose that all
> memory
>> operations operate at 4-byte granularity. The code in question is:
>> char buf[BUF_SIZE];
>> #pragma omp for
>> for(i=0; i<BUF_SIZE; i++)
>> buf[i] = ?;
>> Suppose that buf[] is 4-byte-aligned, thread t gets iteration i=0 and
>> thread r gets iteration i=1. t writes to address &buf, bringing the
> memory
>> range [&buf - &buf+4] into its cache. r writes to &buf+1, also
> bringing
>> the memory range [&buf - &buf+4] into its cache. When these cache
> lines
>> are finally evicted, each contains data that the other does not. As
> such,
>> regardless of which cache line we pick, we will lose data.
>>
>> In short, when the system moves data at 4-byte granularity, writes by
>> multiple threads to the same 4-byte region are data races. It should
> be
>> noted that the above is the reverse of Dieter's example. We're
> worrying
>> about code that operates on memory locations of size x, while the
> hardware
>> supports memory transfers of size y. If x>=y (Dieter's example), we
> have
>> no problem. The problem is cases where x<y (the above example).
>>
>> Greg Bronevetsky
>>
>> On Wed, 21 Mar 2007, Dieter an Mey wrote:
>>
>>> Well, Bronis and Greg, I still don't see whether it should make any
>>> difference to any potential data race, whether the "memory location"
>>> which is spoiling my fun is written in bit or page atomicity by the
>>> memory system of the hardware I am using.
>>> The results are thus unspecified or broken and may be correct a
>>> thousand times but may be wrong the 1001st time.
>>>
>>> I agree completely that there may be situations where it may be
> highly
>>> desirable to know with which atomicity I have to deal with.
>>>
>>> For example on a Sparc system 64-bit floating point numbers may be
>>> written or loaded by two 4-byte memory operations.
>>>
>>> And I would be happy to have an atomic directive for load and store
>>> operations and not only for updates.
>>>
>>> best regards,
>>> Dieter
>>>
>>>
>>> Bronis R. de Supinski schrieb:
>>>> Dieter and all:
>>>>
>>>> Re:
>>>>> > If multiple threads write to the same ** memory location **
>>>> What is a memory location? It is a central question to
>>>> the memory model and is why Greg has said this has
>>>> implications for the memory model.
>>>>
>>>>> > without synchronization, the resulting ** memory content **
>>>>> > is unspecified. If at least one thread reads from
>>>> Anything that says some memory location becomes "unspecified"
>>>> is an issue for the memory model. The memory model must define
>>>> what the state of memory is after any action (legal or not).
>>>> In the case of a location becoming unspecified, it is equivalent
>>>> to a write of that location of random value lambda. The memory
>>>> model needs to state that this occurs.
>>>>
>>>>> > a shared ** memory location ** and at least one thread
> writes to
>>>>> > it without
>>>>> > synchronization, the value seen by any reading thread is
>>>>> > unspecified.
>>>> Currently, we have no precise definition of a memory
>>>> location because stating that a memory location is more
>>>> than one bit could imply that an implementation must
>>>> write that much data atomically. In this case, we are
>>>> not talking about the OpenMP "atomic" construct but
>>>> hardware atomicity.
>>>>
>>>> Simply saying b is a pointer does not solve the problem.
>>>> Consider a simple variant of Brad's example in which bit
>>>> operations to write individual bits in a single byte. By
>>>> the suggested "variable" definitions the code would still
>>>> be correct. However, I know of no current hardware that
>>>> provides atomic writes to individual bits. The reality
>>>> is that writes to the same byte are a data race, even if
>>>> the code describes them as array operations to distinct
>>>> bits. I am certain our vendors would (rightly) oppose being
>>>> required to make that code work.
>>>>
>>>> Note that it is not clear where to define the hardware
>>>> aromicity level, which is why the specification has tried
>>>> to avoid doing so. I could easily argue that the right
>>>> level of write atomicity for a DSM implementation is at
>>>> the page granularity. While I don't think anyone would
>>>> accept that, it is very unclear where we stop. If Brad's
>>>> example used a char array, does it work? I would hope so...
>>>>
>>>>> This text just describes the circumstances of a data race.
>>>> Defining data races and what happens under them are the
>>>> primary role of the memory model. The example demonstrates
>>>> that we probably need to make some statement about the
>>>> minimum level at which the programer can assume write
>>>> atomicity (in the hardware sense). This is much bigger
>>>> issue than what I had intended to cover in the memory
>>>> model revisions, which was really just intended to be
>>>> clarifications and consolidations.
>>>>
>>>> Bronis
>>>>
>>>>
>>>>
>>>>> regards
>>>>> Dieter
>>>>> >
>>>>>
>>>>> Brad Bell schrieb:
>>>>>> I have a question about the OpenMP 2.5 standard
>>>>>> http://www.openmp.org/drupal/mp-documents/spec25.pdf
>>>>>>
>>>>>> In Section 1.2.3 Data Terminology of spec25.pdf,
>>>>>> the following text appears:
>>>>>>
>>>>>> variable
>>>>>> A named data object, whose value can be defined and
>>>>>> redefined during the execution of a program.
>>>>>>
>>>>>> Only an object that is not part of another object is
>>>>>> considered a variable. For example, array elements,
>>>>>> structure components, array sections and substrings
>>>>>> are not considered variables.
>>>>>>
>>>>>>
>>>>>> In Section 1.4.1 Structure of the OpenMP Memory Model of
> spec25.pdf,
>>>>>> the following text appears:
>>>>>>
>>>>>> If multiple threads write to the same shared variable
>>>>>> without synchronization, the resulting value of the variable
>>>>>> in memory is unspecified. If at least one thread reads from
>>>>>> a shared variable and at least one thread writes to it without
>>>>>> synchronization, the value seen by any reading thread is
> unspecified.
>>>>>> It appears to me that, given the text above, that Example A.1.1.c
> of
>>>>>> in the OpenMP 2.5 standard is not correct (or at least
> misleading).
>>>>>> Here is the code for that example:
>>>>>>
>>>>>> void a1(int n, float *a, float *b)
>>>>>> {
>>>>>> int i;
>>>>>> #pragma omp parallel for
>>>>>> for (i=1; i<n; i++) /* i is private by default */
>>>>>> b[i] = (a[i] + a[i-1]) / 2.0;
>>>>>> }
>>>>>>
>>>>>> 1. As I understand the parallel command above, different threads
> may
>>>>>> execute
>>>>>> the loop for different values of i.
>>>>>>
>>>>>> 2. As I understand, the variable b is a shared variable because it
> is
>>>>>> defined before the loop.
>>>>>>
>>>>>> 3. The arguments b to the routine a1 may be an array, for example
>>>>>> it may be declared in the calling program by
>>>>>> float b[SIZE];
>>>>>> where SIZE is any positive integer constant greater than or equal
> n.
>>>>>> 4. In the case of 3 above, b is a variable, and b[i] is not a
> variable,
>>>>>> hence multiple threads may be writing to the same variable; namely
> b.
>>>>>> 5. Thus, in the case described above, the result of the loop is
> undefined.
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Omp mailing list
>>>>>> Omp at openmp.org
>>>>>> http://openmp.org/mailman/listinfo/omp
>>>>>>
>>>>> --
>>>>>
> --------------------------------------------------------------------
>>>>> Dieter an Mey
>>>>> High Performance Computing Hochleistungsrechnen
>>>>> RWTH Aachen University Rechen- und
> Kommunikations-
>>>>> Center for Computing and Communication zentrum der RWTH Aachen
>>>>> phone: ++49-(0)241-80-24377 Seffenter Weg 23
>>>>> fax: ++49-(0)241-80-22134 52074 Aachen, Germany
>>>>> email: anmey at rz.rwth-aachen.de
>>>>>
> --------------------------------------------------------------------
>>>>> _______________________________________________
>>>>> Omp mailing list
>>>>> Omp at openmp.org
>>>>> http://openmp.org/mailman/listinfo/omp
>>>>>
>>> --
>>> --------------------------------------------------------------------
>>> Dieter an Mey
>>> High Performance Computing Hochleistungsrechnen
>>> RWTH Aachen University Rechen- und Kommunikations-
>>> Center for Computing and Communication zentrum der RWTH Aachen
>>> phone: ++49-(0)241-80-24377 Seffenter Weg 23
>>> fax: ++49-(0)241-80-22134 52074 Aachen, Germany
>>> email: anmey at rz.rwth-aachen.de
>>> --------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Omp mailing list
>>> Omp at openmp.org
>>> http://openmp.org/mailman/listinfo/omp
>>>
>>
>>
>>
>
--
--------------------------------------------------------------------
Dieter an Mey
High Performance Computing Hochleistungsrechnen
RWTH Aachen University Rechen- und Kommunikations-
Center for Computing and Communication zentrum der RWTH Aachen
phone: ++49-(0)241-80-24377 Seffenter Weg 23
fax: ++49-(0)241-80-22134 52074 Aachen, Germany
email: anmey at rz.rwth-aachen.de
--------------------------------------------------------------------
More information about the Omp
mailing list