## Exploiting OpenMP

General OpenMP discussion

### Exploiting OpenMP

Hello everybody, I am new here and this is my first serious attempt at OpenMP. In my code there are several methods that contain nested loops for moving in a ny*nx matrix. I wanted to parallelise the process so I used something like this on every method:

Code: Select all
`#pragma omp parallel for private(jj,x_e,x_w,y_n,y_s)  for(ii=0;ii<ny;ii++) {    for(jj=0;jj<nx;jj++) {      /* determine indices of axis-direction neighbours      ** respecting periodic boundary conditions (wrap around) */      y_n = (ii + 1) % ny;      x_e = (jj + 1) % nx;      y_s = (ii == 0) ? (ii + ny - 1) : (ii - 1);      x_w = (jj == 0) ? (jj + nx - 1) : (jj - 1);      //propagate densities to neighbouring cells, following      tmp[ii *nx + jj].speeds[0]  = cells[ii*nx + jj].speeds[0]; /* central cell, */                                                                                     /* no movement   */      tmp[ii *nx + x_e].s[1] = cells[ii*nx + jj].s[1]; /* east */      tmp[y_n*nx + jj].s[2]  = cells[ii*nx + jj].s[2]; /* north */      tmp[ii *nx + x_w].s[3] = cells[ii*nx + jj].s[3]; /* west */      tmp[y_s*nx + jj].s[4]  = cells[ii*nx + jj].s[4]; /* south */      tmp[y_n*nx + x_e].s[5] = cells[ii*nx + jj].s[5]; /* north-east */      tmp[y_n*nx + x_w].s[6] = cells[ii*nx + jj].s[6]; /* north-west */      tmp[y_s*nx + x_w].s[7] = cells[ii*nx + jj].s[7]; /* south-west */            tmp[y_s*nx + x_e].s[8] = cells[ii*nx + jj].s[8]; /* south-east */          }  }`

This piece of code (and the others aswell) is, however, very slow. Is there any way that i can correct my #pragma statement and rewrite the data structure or the loop to make it cache friendly and avoid false sharing? Thank you very much in advance

PS: The code is compiled with -O3 so every attempt at minor optimization didn't achieve any speed up
8mike

Posts: 2
Joined: Fri Oct 18, 2013 3:27 pm

### Re: Exploiting OpenMP

Hi,

Some ideas/suggestions:
• Please send details of computer-OS
• Send details of timing, in particular those measured with omp_get_wtime()
Is there any way that i can correct my #pragma statement and rewrite the data structure or the loop to make it cache friendly and avoid false sharing?

Remember that
• OpenMP will not necessarily make your code "cache friendly", it is not designed to do so.
• I think there is no way (right now) to assume there is false sharing.

HTH,

Fernando.
ftinetti

Posts: 603
Joined: Wed Feb 10, 2010 2:44 pm

### Re: Exploiting OpenMP

Thank you for your response. I am testing this both on Windows 7 and on Linux. For the timing i was using (as stated in my assignment)
Code: Select all
`gettimeofday(&timstr,NULL);tic=timstr.tv_sec+(timstr.tv_usec/1000000.0);...gettimeofday(&timstr,NULL);toc=timstr.tv_sec+(timstr.tv_usec/1000000.0);`

To avoid any misunderstanding I will post here, for whomever wants to help me, my code and the input file to make it work. The method in the example is called propagate, but i need a general speedup in the section that is timed in the main. Thank you very much in advance

PS, to run the code type name input.params obstacles.dat
I compiled it with -O3 -fopenmp

On my machine i get a speedup to 3.7s and on the Linux machine (which will be used for the test) I get 6.2s on 1000. Further optimisation lead me to higher speed on my machine (Intel I7, 6GB RAM) but not on the Linux machine. The Linux machine has a dual-core Opteron processors, memory 8 GB RAM
Attachments
Code and inputs.zip
8mike

Posts: 2
Joined: Fri Oct 18, 2013 3:27 pm

### Re: Exploiting OpenMP

Hi,

I don't have any time to look at the code right now, but I think I will. Anyway,
1) About processors: What Intel I7 model (2 or 4 cores?), How many dual-core Opteron processors in each machine?
2) What do you mean by "On my machine i get a speedup to 3.7s and on the Linux machine (which will be used for the test) I get 6.2s on 1000."? Is it just runtime measured in seconds (remember that speedup is not measured in seconds)?

Fernando.
ftinetti

Posts: 603
Joined: Wed Feb 10, 2010 2:44 pm

### Re: Exploiting OpenMP

8mike wrote:This piece of code (and the others aswell) is, however, very slow. Is there any way that i can correct my #pragma statement and rewrite the data structure or the loop to make it cache friendly and avoid false sharing?

It looks like there is some potential for false sharing as different threads may be writing to (different subfields of) the same element of tmp.
Your implementation essentially loops over the sources and works out where the destinations are: it might be better to rewrite the code to loop over the destinations and workout where the sources are: having different threads read from the same element of cells would not matter so much.
MarkB

Posts: 670
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh