I am a newbie using openMP. I am trying to parallelise a library written in C++. It is as follows :
1. First I read an input file, and create a certain std::map (between pointers and ints) . This is done serially.
2. Then, I scatter this std::map among multiple threads (2,4,8,... till 32), basically filter what belongs to a certain partition and assign it to a thread. Also done serially.
3. Now I spawn multiple threads which use their filtered out data and perform pointer manipulations here(note that the pointers are the same as in the original std::map, but the data stuctures are much smaller as a small subset is assigned to each thread). These operations are done on my 32-core machine.
4. Then I merge the data from various threads and do some final computations.
Step 3 is the parallel region and scales well till about 8 threads. But after this, the cache misses increase too much to give any appreciable speedup.
I assume this is due to the fact that the data in each thread still contains pointers randomly residing in the same memory region.
Can a workaround be suggested?