Exploiting OpenMP

General OpenMP discussion

Exploiting OpenMP

Postby 8mike » Sat Oct 19, 2013 3:20 am

Hello everybody, I am new here and this is my first serious attempt at OpenMP. In my code there are several methods that contain nested loops for moving in a ny*nx matrix. I wanted to parallelise the process so I used something like this on every method:

Code: Select all
#pragma omp parallel for private(jj,x_e,x_w,y_n,y_s)
  for(ii=0;ii<ny;ii++) {
    for(jj=0;jj<nx;jj++) {
      /* determine indices of axis-direction neighbours
      ** respecting periodic boundary conditions (wrap around) */
      y_n = (ii + 1) % ny;
      x_e = (jj + 1) % nx;
      y_s = (ii == 0) ? (ii + ny - 1) : (ii - 1);
      x_w = (jj == 0) ? (jj + nx - 1) : (jj - 1);
      //propagate densities to neighbouring cells, following
      tmp[ii *nx + jj].speeds[0]  = cells[ii*nx + jj].speeds[0]; /* central cell, */
                                                                                     /* no movement   */
      tmp[ii *nx + x_e].s[1] = cells[ii*nx + jj].s[1]; /* east */
      tmp[y_n*nx + jj].s[2]  = cells[ii*nx + jj].s[2]; /* north */
      tmp[ii *nx + x_w].s[3] = cells[ii*nx + jj].s[3]; /* west */
      tmp[y_s*nx + jj].s[4]  = cells[ii*nx + jj].s[4]; /* south */
      tmp[y_n*nx + x_e].s[5] = cells[ii*nx + jj].s[5]; /* north-east */
      tmp[y_n*nx + x_w].s[6] = cells[ii*nx + jj].s[6]; /* north-west */
      tmp[y_s*nx + x_w].s[7] = cells[ii*nx + jj].s[7]; /* south-west */     
      tmp[y_s*nx + x_e].s[8] = cells[ii*nx + jj].s[8]; /* south-east */     
    }
  }


This piece of code (and the others aswell) is, however, very slow. Is there any way that i can correct my #pragma statement and rewrite the data structure or the loop to make it cache friendly and avoid false sharing? Thank you very much in advance

PS: The code is compiled with -O3 so every attempt at minor optimization didn't achieve any speed up
8mike
 
Posts: 2
Joined: Fri Oct 18, 2013 3:27 pm

Re: Exploiting OpenMP

Postby ftinetti » Sat Oct 19, 2013 8:05 am

Hi,

Some ideas/suggestions:
  • Please send details of computer-OS
  • Send details of timing, in particular those measured with omp_get_wtime()
  • About
    Is there any way that i can correct my #pragma statement and rewrite the data structure or the loop to make it cache friendly and avoid false sharing?

    Remember that
    • OpenMP will not necessarily make your code "cache friendly", it is not designed to do so.
    • I think there is no way (right now) to assume there is false sharing.

HTH,

Fernando.
ftinetti
 
Posts: 567
Joined: Wed Feb 10, 2010 2:44 pm

Re: Exploiting OpenMP

Postby 8mike » Sat Oct 19, 2013 8:25 am

Thank you for your response. I am testing this both on Windows 7 and on Linux. For the timing i was using (as stated in my assignment)
Code: Select all
gettimeofday(&timstr,NULL);
tic=timstr.tv_sec+(timstr.tv_usec/1000000.0);
...
gettimeofday(&timstr,NULL);
toc=timstr.tv_sec+(timstr.tv_usec/1000000.0);


To avoid any misunderstanding I will post here, for whomever wants to help me, my code and the input file to make it work. The method in the example is called propagate, but i need a general speedup in the section that is timed in the main. Thank you very much in advance

PS, to run the code type name input.params obstacles.dat
I compiled it with -O3 -fopenmp

On my machine i get a speedup to 3.7s and on the Linux machine (which will be used for the test) I get 6.2s on 1000. Further optimisation lead me to higher speed on my machine (Intel I7, 6GB RAM) but not on the Linux machine. The Linux machine has a dual-core Opteron processors, memory 8 GB RAM
Attachments
Code and inputs.zip
(7.25 KiB) Downloaded 141 times
8mike
 
Posts: 2
Joined: Fri Oct 18, 2013 3:27 pm

Re: Exploiting OpenMP

Postby ftinetti » Mon Oct 21, 2013 3:35 am

Hi,

I don't have any time to look at the code right now, but I think I will. Anyway,
1) About processors: What Intel I7 model (2 or 4 cores?), How many dual-core Opteron processors in each machine?
2) What do you mean by "On my machine i get a speedup to 3.7s and on the Linux machine (which will be used for the test) I get 6.2s on 1000."? Is it just runtime measured in seconds (remember that speedup is not measured in seconds)?

Fernando.
ftinetti
 
Posts: 567
Joined: Wed Feb 10, 2010 2:44 pm

Re: Exploiting OpenMP

Postby MarkB » Mon Oct 28, 2013 7:31 am

8mike wrote:This piece of code (and the others aswell) is, however, very slow. Is there any way that i can correct my #pragma statement and rewrite the data structure or the loop to make it cache friendly and avoid false sharing?


It looks like there is some potential for false sharing as different threads may be writing to (different subfields of) the same element of tmp.
Your implementation essentially loops over the sources and works out where the destinations are: it might be better to rewrite the code to loop over the destinations and workout where the sources are: having different threads read from the same element of cells would not matter so much.
MarkB
 
Posts: 422
Joined: Thu Jan 08, 2009 10:12 am


Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 5 guests