[Omp] First Touch initialization

Ruud van der Pas Ruud.Vanderpas at Sun.COM
Wed Mar 8 04:09:06 PST 2006


Hi Francisco,

> Initializing shared arrays in parallel at the very beginning of the program
> will distribute the contents of each array according to the access pattern
> hence, in NUMA machines access will be much faster since it's local-node.
> 
> We have tried it and it works indeed (Intel Fortran compiler v9 on 4-way 
> Opteron),
> but I don't understand why.

First touch (or data placement in general) is not something
typically handled by a compiler. It is controlled by the Operating
System. Solaris has cc-NUMA support for example, but I believe
Linux supports it too these days.

The general rule is that the thread first touching (a chunk of)
the data gets it in it's local memory. Typically such first touch
happens when initializing or reading the data for the first time.

If for example, you use "malloc" to allocate a chunk of memory,
nothing has happened yet. All the OS does is to reserve that
chunk for you.

The minute a thread then _accesses_ a portion (or all) of it,
first touch causes it to be owned by that thread.

This is why one can speed up an OpenMP program running on a
cc-NUMA system by parallelizing the data initialization phase.
Even adding a redudant initialization upfront could work out
well (in case the first touch is through sequential I/O for
example).

How you want to initialize the data in parallel depends on
how you access the data later on.

Kind regards,
Ruud

PS I did mention "malloc" for a good reason. With "calloc" the
    data gets pre-initialized to zero and may therefore end up on
    the wrong node.
----------------------------------------------------------------
Senior Staff Engineer             Email: ruud.vanderpas at sun.com
Scalable Systems Group            Phone: +31-33-4515000 (x15920)
Sun Microsystems                  Fax  : +31-33-4515001
----------------------------------------------------------------




More information about the Omp mailing list