OpenMP updating sharing array very slow

General OpenMP discussion

Re: OpenMP updating sharing array very slow

Postby ftinetti » Wed May 22, 2013 4:21 am

Hi,

Timings on a dual Xeon 5405 (4 cores each) Linux, gfortran 4.6 -O2:

No OpenMP
-------------
layer: 1 0.0000000000000000
layer: 2 35.359375000000000
layer: 3 70.638671875000000
layer: 4 105.97265625000000
layer: 5 141.64257812500000
layer: 6 176.92382812500000
layer: 7 212.03515625000000
layer: 8 247.12890625000000
layer: 9 282.55273437500000
layer: 10 317.78906250000000
Total calculation time is (T_t): 352.76757812500000 seconds.


2 threads
-----------
layer: 1 0.0000000000000000
layer: 6 0.0000000000000000
layer: 7 38.431640625000000
layer: 2 39.865234375000000
layer: 8 76.916015625000000
layer: 3 79.521484375000000
layer: 9 115.32617187500000
layer: 4 119.36718750000000
layer: 10 153.55859375000000
layer: 5 159.23632812500000
Total calculation time is (T_t): 198.59570312500000 seconds.

5 threads
-----------
layer: 1 0.0000000000000000
layer: 7 0.0000000000000000
layer: 3 0.0000000000000000
layer: 9 0.0000000000000000
layer: 5 0.0000000000000000
layer: 8 45.062500000000000
layer: 4 45.396484375000000
layer: 6 45.556640625000000
layer: 2 49.099609375000000
layer: 10 49.105468750000000
Total calculation time is (T_t): 96.830078125000000 seconds.

And I agree with Mark:
removing the assignment to big_array is causing the compiler to optimise away most of the code


I also agree on
I strongly suspect the lack of scaling of the code is due to memory bandwidth contention: the code is basically just repeatedly trawling through the coord array with no re-use.

but I don't have enough cores in order to verify the "extreme" case (which would be 10, and my maximum is 8). In your case, please take a look at the i5 model, since, for example, the i5-2540M has only 2 cores with HT (http://ark.intel.com/products/50072), and I didn't find an i5 with 4 cores (but there are a lot of i5 models running at 2.6GHz and I didn't look at all of them).

HTH,

Fernando.
ftinetti
 
Posts: 582
Joined: Wed Feb 10, 2010 2:44 pm

Re: OpenMP updating sharing array very slow

Postby jiwa » Sun May 26, 2013 3:31 am

Thanks Mark and Fernando. I found the main problem of my code, which was in this line: dx=minval((/x2,xx2/))-maxval((/x1,xx1/)). If I change the fortran intrinsic functions into simple 'if-else' statements, the problem is gone. The 32-processor parallel runtime is only 6 seconds (move nn loop the most outer loop).

I guess OpenMP compliler does not like those intrinsic functions, instead it prefers simple basic statements when it comes to performance optimization.

Cheers

Ji
jiwa
 
Posts: 5
Joined: Mon May 20, 2013 11:01 pm

Re: OpenMP updating sharing array very slow

Postby MarkB » Mon May 27, 2013 4:05 am

You're very welcome!

The problem with using minval/maxval may not be the intrinsics as such: it might be that the compiler is repeatedly allocating/deallocating temporary storage for the likes of (/x1,xx1/), which could cause contention for lock somewhere low down.
MarkB
 
Posts: 479
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Previous

Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 12 guests