openMP and caches

General OpenMP discussion

openMP and caches

Postby anrrmrty » Sat Apr 13, 2013 11:58 am


I am a newbie using openMP. I am trying to parallelise a library written in C++. It is as follows :

1. First I read an input file, and create a certain std::map (between pointers and ints) . This is done serially.

2. Then, I scatter this std::map among multiple threads (2,4,8,... till 32), basically filter what belongs to a certain partition and assign it to a thread. Also done serially.

3. Now I spawn multiple threads which use their filtered out data and perform pointer manipulations here(note that the pointers are the same as in the original std::map, but the data stuctures are much smaller as a small subset is assigned to each thread). These operations are done on my 32-core machine.

4. Then I merge the data from various threads and do some final computations.

Step 3 is the parallel region and scales well till about 8 threads. But after this, the cache misses increase too much to give any appreciable speedup.
I assume this is due to the fact that the data in each thread still contains pointers randomly residing in the same memory region.
Can a workaround be suggested?

Thank you!
Posts: 1
Joined: Sat Apr 13, 2013 3:08 am

Re: openMP and caches

Postby MarkB » Tue Apr 23, 2013 4:24 am

Hi there,

A couple of possibilities come to mind:

1. False sharing, where different threads are modifying data which are on the same cache line. This will result in an increase in the number of cache misses (are you measuring cache misses with some tool?). It may be possible to fix this by changing the data partition to avoid such conflicting accesses.

2. NUMA effects, where all the data is allocated in the memory of a single socket (I presume you are running on a multi-socket system?). This won't increase the number of cache misses, but will increase their cost. Since you are initialising the data in the map serially, this will most likely result in all the data being allocated on the socket where the OpenMP master thread is running. On a Linux system you can try using numactl -i all to change the allocation policy to round-robin. Alternatively, you could try using multiple threads to initialise the data - you may need enforce synchronisation to do this one thread at a time, as std::map methods may not be thread safe.

Hope that helps,
Posts: 578
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 5 guests