答复: 庰瀻: [Omp] Could you help me about this problem?
Scott Paine
spaine at cfa.harvard.edu
Thu Jan 4 06:46:15 PST 2007
David,
This kind of behavior can happen if your computation doesn't fit
entirely in cache. Increasing the number of omp threads beyond the
number of cores can initially improve performance by breaking the
computation into cache-sized pieces. Possibly this is what happens in
your example when going from 3 to 4 threads. Beyond this point,
increasing the number of threads increases scheduling overhead and
increases the probability that one thread will evict another thread's
data from the cache.
If this is indeed what is happening, you might get better performance by
breaking the computation into blocks smaller than the total cache
available to all cores, then using nthreads=ncores threads to do the
work on one block at a time. If the number of accesses per data item
per loop iteration is high, it can even be fastest to do cache blocking
down to the L1 cache size.
Good luck!
Scott Paine
Smithsonian Astrophysical Observatory
spaine at cfa.harvard.edu
On Thu, 2007-01-04 at 16:15 +0800, 宋刚 wrote:
> Hi,Ruud,
> Thank you for your kind reply.
> I know what you mean. I will try it later.
> When I get good news, I will tell you!
> Yours sinecely,
> David Song
>
> -----邮件原件-----
> 发件人: Ruud.Vanderpas at Sun.COM [mailto:Ruud.Vanderpas at Sun.COM]
> 发送时间: 2007年1月4日 lily 15:58
> 收件人: 宋炚
> 抄送: omp at openmp.org
> 主题: Re: ŽðžŽ: [Omp] Could you help me about this problem?
>
> Hi David,
>
> Thanks for the information.
>
> > Just in the subroutine, I get the time like this.
> >
> > ¡¡
> >
> > real e,etime,t(2)
> >
> > e = etime(t)
> >
> > do .....
> >
> > ........
> >
> > end do
> >
> > e = etime(t)
> >
> > print *,¡¯Elapsed time is :¡¯,e,
>
> It seems that you assume "etime" returns the time elapsed
> since the previous call to it. Did you check the documentation
> to make sure this is the case?
>
> In any case it might be better to use the omp_get_wtime call.
> This returns the elapsed time in seconds. It is an absolute
> value, so you need to subtract it from the previous call.
> Something like this:
>
> t1 = omp_get_wtime()
> do ...
> ...
> end do
> t2 = omp_get_wtime() - t1
>
> It will be useful to give it a try using this timer instead and
> see what you get then.
>
> > The time I calculate is only in a subroutine.
>
> Thanks.
>
> > The result can be reproduced again. I tested it three more times.
>
> That is interesting.
>
> > Is there something wrong with the time I used? Previously, I also wrote
> > a OpenMP program to compress the data. On a SMP(2 cores) platform, when
> > 4 threads the performance is the best, the speed up sometimes can go to
> > 5.89, I didn¡¯t know the reason. On another platform, a SMP server(4-way
> > Dual core, there are 8 cores), the best performance can be got when
> > there are 12 threads, and the speed up can also be to about to 12. The
> > time I got is like below:
> >
> > time gzip data1
> >
> > time ./ompgzip data1
>
> There could be some superlinear behavior, but that is just a
> wild guess. You'll probably need to use a profiling tool to
> do some more in-depth analysis.
>
> Kind regards,
> Ruud
> ----------------------------------------------------------------
> Senior Staff Engineer Email: ruud.vanderpas at sun.com
> Systems Group Phone: +31-33-4515000 (x15920)
> Sun Microsystems Fax : +31-33-4515001
> ----------------------------------------------------------------
>
>
> _______________________________________________
> Omp mailing list
> Omp at openmp.org
> http://openmp.org/mailman/listinfo/omp
>
More information about the Omp
mailing list