[Omp] A speed test in small memory
Brad Bell
bradbell at seanet.com
Wed Apr 4 06:06:17 PDT 2007
In a previous OpenMP mailing list thread
http://openmp.org/pipermail/omp/2007/000714.html
there is a discussion about memory considerations and OpenMP execution
speed.
In order to determine how much speed up is possible with OpenMP, I
created a test case that uses very little memory. It appears that the
improvement, with increasing the number of processors, drops off when
there are more than four processors (see the results below). Is this to
be expected from OpenMP in general, or is there a way to get continued
improvement with more than four processors (perhaps using a different
system or different algorithm) ?
This speed test computes the summation
1 + 1/2 + 1/3 ... + 1/n
The total summation is split into pieces, each with the same number of
terms (plus or minus one). The summation for each piece is computed by a
separate thread and in parallel with the other threads. Once all the
threads are done, the master thread sums the result for each thread. The
sums computed in parallel have millions of terms. The number of terms
summed by the master at the end is bounded by the number of processors.
Thus, the sum at the end should not take any significant amount of time.
I am attaching a bash script that creates and runs this test (see
comments at the top of the script before running it).
Below are the results for running this script with 8 Intel Xeon model
5320 processors running at 1.86Ghz each. The first few lines are the
output for the g++ version command. The next few lines are the commands
used to compile the programs. The results for not using OpenMP are
listed under
./sum_i_inv_no_openmp
The results for dynamic threading are listed under
./sum_i_inv_yes_openmp + dynamic thread adjust
The results were the program specifies the number of threads are under
./sum_i_inv_yes_openmp
In these results n_thread is the number of threads, mega_n_sum is the
number of millions of terms in the summation, and seconds is the number
of wall clock seconds to repeat the summation n_repeat times.
g++ --version
g++ (GCC) 4.1.1 20070105 (Red Hat 4.1.1-53)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
g++ sum_i_inv.cpp -o sum_i_inv_no_openmp -DNDEBUG -O2
g++ sum_i_inv.cpp -o sum_i_inv_yes_openmp -fopenmp -DNDEBUG -O2
./sum_i_inv_no_openmp
n_thread=1, mega_n_sum=20, n_repeat=10, seconds=3.33223
./sum_i_inv_yes_openmp + dynamic thread adjust
n_thread=8, mega_n_sum=20, n_repeat=10, seconds=1.02021
./sum_i_inv_yes_openmp
n_thread=1, mega_n_sum=20, n_repeat=10, seconds=3.34029
n_thread=2, mega_n_sum=20, n_repeat=10, seconds=1.68343
n_thread=3, mega_n_sum=20, n_repeat=10, seconds=1.19978
n_thread=4, mega_n_sum=20, n_repeat=10, seconds=1.6894
n_thread=5, mega_n_sum=20, n_repeat=10, seconds=1.33842
n_thread=6, mega_n_sum=20, n_repeat=10, seconds=1.18616
n_thread=7, mega_n_sum=20, n_repeat=10, seconds=1.43184
n_thread=8, mega_n_sum=20, n_repeat=10, seconds=0.907609
./sum_i_inv_no_openmp
n_thread=1, mega_n_sum=40, n_repeat=10, seconds=6.66339
./sum_i_inv_yes_openmp + dynamic thread adjust
n_thread=8, mega_n_sum=40, n_repeat=10, seconds=1.74432
./sum_i_inv_yes_openmp
n_thread=1, mega_n_sum=40, n_repeat=10, seconds=6.676
n_thread=2, mega_n_sum=40, n_repeat=10, seconds=3.38764
n_thread=3, mega_n_sum=40, n_repeat=10, seconds=2.24893
n_thread=4, mega_n_sum=40, n_repeat=10, seconds=1.78959
n_thread=5, mega_n_sum=40, n_repeat=10, seconds=2.55187
n_thread=6, mega_n_sum=40, n_repeat=10, seconds=2.1632
n_thread=7, mega_n_sum=40, n_repeat=10, seconds=1.92769
n_thread=8, mega_n_sum=40, n_repeat=10, seconds=1.76401
~
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sum_i_inv.sh
Url: http://openmp.org/pipermail/omp/attachments/20070404/a03285c6/attachment.pl
More information about the Omp
mailing list