[Omp] A speed test in small memory

Brad Bell bradbell at seanet.com
Wed Apr 4 06:06:17 PDT 2007


In a previous OpenMP mailing list thread
    http://openmp.org/pipermail/omp/2007/000714.html
there is a discussion about memory considerations and OpenMP execution 
speed.

In order to determine how much speed up is possible with OpenMP, I 
created a test case that uses very little memory. It appears that the 
improvement, with increasing the number of processors, drops off when 
there are more than four processors (see the results below). Is this to 
be expected from OpenMP in general, or is there a way to get continued 
improvement with more than four processors (perhaps using a different 
system or different algorithm) ?

This speed test computes the summation
   1 + 1/2 + 1/3 ... + 1/n
The total summation is split into pieces, each with the same number of 
terms (plus or minus one). The summation for each piece is computed by a 
separate thread and in parallel with the other threads. Once all the 
threads are done, the master thread sums the result for each thread. The 
sums computed in parallel have millions of terms. The number of terms 
summed by the master at the end is bounded by the number of processors. 
Thus, the sum at the end should not take any significant amount of time.

I am attaching a bash script that creates and runs this test (see 
comments at the top of the script before running it).

Below are the results for running this script with 8 Intel Xeon model 
5320 processors running at 1.86Ghz each. The first few lines are the 
output for the g++ version command. The next few lines are the commands 
used to compile the programs. The results for not using OpenMP are 
listed under
    ./sum_i_inv_no_openmp
The results for dynamic threading are listed under
    ./sum_i_inv_yes_openmp + dynamic thread adjust
The results were the program specifies the number of threads are under
    ./sum_i_inv_yes_openmp
In these results n_thread is the number of threads, mega_n_sum is the 
number of millions of terms in the summation, and seconds is the number 
of wall clock seconds to repeat the summation n_repeat times.


g++ --version
g++ (GCC) 4.1.1 20070105 (Red Hat 4.1.1-53)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

g++ sum_i_inv.cpp -o sum_i_inv_no_openmp -DNDEBUG -O2
g++ sum_i_inv.cpp -o sum_i_inv_yes_openmp -fopenmp -DNDEBUG -O2
./sum_i_inv_no_openmp
n_thread=1, mega_n_sum=20, n_repeat=10, seconds=3.33223
./sum_i_inv_yes_openmp + dynamic thread adjust
n_thread=8, mega_n_sum=20, n_repeat=10, seconds=1.02021
./sum_i_inv_yes_openmp
n_thread=1, mega_n_sum=20, n_repeat=10, seconds=3.34029
n_thread=2, mega_n_sum=20, n_repeat=10, seconds=1.68343
n_thread=3, mega_n_sum=20, n_repeat=10, seconds=1.19978
n_thread=4, mega_n_sum=20, n_repeat=10, seconds=1.6894
n_thread=5, mega_n_sum=20, n_repeat=10, seconds=1.33842
n_thread=6, mega_n_sum=20, n_repeat=10, seconds=1.18616
n_thread=7, mega_n_sum=20, n_repeat=10, seconds=1.43184
n_thread=8, mega_n_sum=20, n_repeat=10, seconds=0.907609
./sum_i_inv_no_openmp
n_thread=1, mega_n_sum=40, n_repeat=10, seconds=6.66339
./sum_i_inv_yes_openmp + dynamic thread adjust
n_thread=8, mega_n_sum=40, n_repeat=10, seconds=1.74432
./sum_i_inv_yes_openmp
n_thread=1, mega_n_sum=40, n_repeat=10, seconds=6.676
n_thread=2, mega_n_sum=40, n_repeat=10, seconds=3.38764
n_thread=3, mega_n_sum=40, n_repeat=10, seconds=2.24893
n_thread=4, mega_n_sum=40, n_repeat=10, seconds=1.78959
n_thread=5, mega_n_sum=40, n_repeat=10, seconds=2.55187
n_thread=6, mega_n_sum=40, n_repeat=10, seconds=2.1632
n_thread=7, mega_n_sum=40, n_repeat=10, seconds=1.92769
n_thread=8, mega_n_sum=40, n_repeat=10, seconds=1.76401
~
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sum_i_inv.sh
Url: http://openmp.org/pipermail/omp/attachments/20070404/a03285c6/attachment.pl 


More information about the Omp mailing list