by Brian bliss » Sat Dec 29, 2007 10:20 am
Just set your KMP_AFFINITY env var to "compact" or "scatter". If you include the "verbose" prefix, the some debug info will be printed,
then you can make certain that you are getting what you expect:
% setenv KMP_AFFINITY verbose,compact
% a.out
KMP_AFFINITY: Affinity capable, using global cpuid instr info
KMP_AFFINITY: Initial OS proc set respected:
{0,1,2,3}
KMP_AFFINITY: 4 available OS procs - Uniform topology of
KMP_AFFINITY: 2 packages x 2 cores/pkg x 1 threads/core (4 total cores)
KMP_AFFINITY: OS proc to physical thread map ([] => level not in map):
KMP_AFFINITY: OS proc 0 maps to package 0 core 0 [thread 0]
KMP_AFFINITY: OS proc 2 maps to package 0 core 1 [thread 0]
KMP_AFFINITY: OS proc 1 maps to package 3 core 0 [thread 0]
KMP_AFFINITY: OS proc 3 maps to package 3 core 1 [thread 0]
KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
KMP_AFFINITY: Internal thread 1 bound to OS proc set {2}
KMP_AFFINITY: Internal thread 2 bound to OS proc set {1}
KMP_AFFINITY: Internal thread 3 bound to OS proc set {3}
This is for a 2-chip dual-core machine with no Hyperthreading.
"compact" means to place consecutive threads as close together as possible, so OMP thread 1 is placed on the same package as OMP thread 0, i.e. package 0. If you use "scatter", then the threads are placed as far apart as possible. "scatter" is useful for BW-intensive programs when OMP_NUM_THREADS < num_procs. Otherwise, you're probably better off with "compact".
On a machine with hyperthreading, the threads are allowed to float between the different thread contexts on the same core, by default, so you might see something like this:
% setenv KMP_AFFINITY verbose,compact
% a.out
KMP_AFFINITY: Affinity capable, using global cpuid instr info
KMP_AFFINITY: Initial OS proc set respected:
{0,1,2,3,4,5,6,7}
KMP_AFFINITY: 8 available OS procs - Uniform topology of
KMP_AFFINITY: 2 packages x 2 cores/pkg x 2 threads/core (4 total cores)
KMP_AFFINITY: OS proc to physical thread map ([] => level not in map):
KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1
KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0
KMP_AFFINITY: OS proc 6 maps to package 0 core 1 thread 1
KMP_AFFINITY: OS proc 1 maps to package 1 core 0 thread 0
KMP_AFFINITY: OS proc 5 maps to package 1 core 0 thread 1
KMP_AFFINITY: OS proc 3 maps to package 1 core 1 thread 0
KMP_AFFINITY: OS proc 7 maps to package 1 core 1 thread 1
KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4}
KMP_AFFINITY: Internal thread 5 bound to OS proc set {1,5}
KMP_AFFINITY: Internal thread 4 bound to OS proc set {1,5}
KMP_AFFINITY: Internal thread 3 bound to OS proc set {2,6}
KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6}
KMP_AFFINITY: Internal thread 1 bound to OS proc set {0,4}
KMP_AFFINITY: Internal thread 7 bound to OS proc set {3,7}
KMP_AFFINITY: Internal thread 6 bound to OS proc set {3,7}
that should be enough to get you started.
If you don't see debug output similar to what is printed here, the first thing to do is check the OMP RTL version number:
% setenv KMP_VERSION 1
% a.out
Intel(R) OMP performance library (dynamic) ver. 20070803 (C) Copyright 1997-2007 by Intel Corporation
Intel(R) OMP library built: Aug 3 2007, 16:20:12 using Intel C++ Compiler 10.0
If the version # printed is prior to 200060612, then the affinity support is out-of-date.
-bb
Last bumped by Anonymous on Sat Dec 29, 2007 10:20 am.