Openmp performance on different hardware/OS

General OpenMP discussion

Re: Openmp performance on different hardware/OS

Anyone?

ilmar
ilmarw

Posts: 5
Joined: Tue Jan 08, 2008 3:47 am

Re: Openmp performance on different hardware/OS

The first line after the first pragma looks like this:
// #pragma omp parallel shared(A, col, row)
for (k = 0; k<SIZE-1; k++) {

There is no worksharing directive so all threads execute the for loop in parallel. You can't execute that for loop in parallel because of data dependences.
lfm

Posts: 135
Joined: Sun Oct 21, 2007 4:58 pm
Location: OpenMP ARB

Re: Openmp performance on different hardware/OS

OK, I see it now. What dependencies do you mean, the fact that the variable n depends on k?

ilmar
ilmarw

Posts: 5
Joined: Tue Jan 08, 2008 3:47 am

Re: Openmp performance on different hardware/OS

Here is the code:
Code: Select all
`for (k = 0; k<SIZE-1; k++) {    /* set col values to column k of A */    for (n = k; n<SIZE; n++) {      col[n] = A[n][k];    }    /* scale values of A by multiplier */    for (n = k+1; n<SIZE; n++) {      A[k][n] /= col[k];    }    /* set row values to row k of A */    for (n = k+1; n<SIZE; n++) {      row[n] = A[k][n];    }    /* Here we update A by subtracting the appropriate values from row       and column.  Note that these adjustments to A can be done in       any order */#pragma omp parallel for shared(A, row, col)    for (i = k+1; i<SIZE; i++) {      for (j = k+1; j<SIZE; j++) {   A[i][j] = A[i][j] - row[i] * col[j];      }    }  }`

Let's look at iteration m of the outer loop. It uses A[m:SIZE-1][m], A[m][m+1:SIZE-1], and A[m+1:SIZE-1][m+1:SIZE-1]. It modifies A[m][m+1:SIZE-1] and A[m+1:SIZE-1][m+1:SIZE-1]. So on iteration m+1, the value of A[m+1][m+2:SIZE-1] (for example) is used, which was computed on the previous iteration. Thus there is a loop-carried true dependence from iteration m to iteration m+1 and the loop cannot be parallelized as written.

One way to get more parallelism here is to block the computation. This URL seems to have a good explanation:
http://www.cs.berkeley.edu/~demmel/cs267/lecture12/lecture12.html#link_5

-- Larry
lfm

Posts: 135
Joined: Sun Oct 21, 2007 4:58 pm
Location: OpenMP ARB

Re: Openmp performance on different hardware/OS

For all the examples at http://kallipolis.com/openmp/, the parallel version runs slower than the serial version. Here`s my hardware setup:

Intel(R) Core(TM)2 CPU T5300 @ 1.73GHz
Ubuntu Linux 7.10
GCC 4.2.1
Intel C Compiler 10.1

When the parallel version is running, there's only one process on the output of top. I can't prove for sure, but I'm almost sure that the threads are not being divided equally among the cores. Has anyone got a good speedup using Linux?
jmhal

Posts: 3
Joined: Tue Feb 12, 2008 6:29 pm

Re: Openmp performance on different hardware/OS

I downloaded the two files combined.c and combined_mp.c from the site you gave (http://kallipolis.com/openmp/) and ran them on an IBM box. It was running Linux 2.6.9-11.EL smp and had 2 Intel Pentium 4 processors running at 3.6 Ghz. The compiler I had access to was the Intel C Compiler 9.1.037. Here is what I saw:
Code: Select all
`% icc combined.c% time a.oute started at 0e done at 5970000pi started at 5970000pi done at 11060000integration started at 11060000integration done at 19930000Values: e*pi = 8.539734,  integral = 9.666667Total elapsed time: 19930.000 seconds19.901u 0.035s 0:20.02 99.5%    0+0k 0+0io 0pf+0w% setenv OMP_DYNAMIC FALSE% setenv OMP_NUM_THREADS 2% icc -openmp combined_mp.ccombined_mp.c(33) : (col. 1) remark: OpenMP DEFINED SECTION WAS PARALLELIZED.combined_mp.c(65) : (col. 1) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.combined_mp.c(31) : (col. 1) remark: OpenMP DEFINED REGION WAS PARALLELIZED.% time a.oute started at 0pi started at 0e done at 14600000integration started at 14600000pi done at 15190000integration started at 15190000integration done at 30840000Values: e*pi = 8.539734,  integral = 9.666667Total elapsed time: 30980.000 seconds30.963u 0.028s 0:15.64 198.0%   0+0k 0+0io 0pf+0w`

The first thing to note, is that the value returned from clock() in the program is not accurate when running in parallel. The programmer also didn't do the calculation of seconds correctly (they divided by 1000 instead of CLOCKS_PER_SEC). lfm had made note of this before in a previous post. Looking at the time values returned for elapse, I am seeing about a 21.8% decrease in elapse time.

Running this on an older Sparc system running Solaris 10 and using the Sun Studio 12 compiler, I am seeing about a 29.4% decrease in elapse time.
Code: Select all
`% cc -xO3 combined.c% time a.oute started at 0e done at 120000pi started at 120000pi done at 7680000integration started at 7680000integration done at 17620000Values: e*pi = 8.539734,  integral = 9.666667Total elapsed time: 17620.000 seconds17.0u 0.0s 0:17 96% 0+0k 0+0io 0pf+0w% setenv OMP_DYNAMIC FALSE% setenv OMP_NUM_THREADS 2% cc -xO3 -xopenmp combined_mp.c% time a.oute started at 0pi started at 0e done at 230000integration started at 230000integration done at 9670000pi done at 12650000integration started at 12650000Values: e*pi = 8.539734,  integral = 9.666667Total elapsed time: 17570.000 seconds17.0u 0.0s 0:12 131% 0+0k 0+0io 0pf+0w`

So with this program, as written, I don't think that you are going to see large decreases in elapse time using 2 processors. Unfortunately, I don't have a Linux system available right now that is close to yours. I did try it on the same old Sparc system using 4 threads and got a 47% reduction in elapse time.
Code: Select all
`% setenv OMP_NUM_THREADS 4% time a.outintegration started at 0integration started at 0e started at 0pi started at 0e done at 390000integration started at 390000integration done at 9340000pi done at 14920000integration started at 14920000Values: e*pi = 8.539734,  integral = 9.666667Total elapsed time: 17310.000 seconds17.0u 0.0s 0:09 170% 0+0k 0+0io 0pf+0w`

What sort of speedup are you seeing?
ejd

Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Previous

Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot], Yahoo [Bot] and 5 guests