newbie probs

General OpenMP discussion

newbie probs

Postby enrico » Wed Jul 16, 2008 3:52 pm

Hello everyone,

I have just started using openmp with fortran 90, so my problem may be due to my own ignorance (sorry in this case).
I have a function which works alright when compiled serially, and which calculates the sum of various term. The calculation of each of these terms is independent,
so what I'd like to do would be to calculate parallelly each of them and add them to the sum as soon as they are calculated. Unfortunately, I get an integer divided by zero runtime error when I run the parallel version (As I said the code work fine in serial mode). I've spent all day trying to fix the problem, but all I discovered was that if I add some write(*,*) "Hi" instructions in some independent part of the code the segmentation fault happens much later. I think this points to some stupid mistake of mine in defining the scope of the variables, but what I did seems alright to me.

I have pasted the code below. There are quite a few write instructions which print stuff in files, but I hope that it is understandable. Is anyone seeing any trivial mistakes in the use of openmp? Any hint is welcome...

function chi2(x)
USE NRTYPE
USE NRUTIL
use cosmological_parameters
USE cooling_and_collapse
USE DISK_FORMATION
use SMBH_qso_mode
use reservoir_size
use redshift_mod
use analytical_star_formation
use evolution
use mass_resolution
use accuracy_stiff
use SN_params
use bar_instability_thresholds
use reservoir_size
use supereddington
IMPLICIT NONE

#ifdef _OPENMP
include 'omp_lib.h' !needed for OMP_GET_NUM_THREADS()
#endif

real(dp), intent(in), dimension(:) :: x
real(dp) :: chi2
integer(I4B), parameter :: n=100
real(dp), parameter :: z_form = 10.d0
real(dp), parameter :: a_mah = 0.6d0
real(dp), parameter :: zmax=8.d0, zmin=0.d0
real(dp), parameter :: lambda = 0.04d0
real(dp), dimension(n) :: redshift, M_vir, R_vir, V_vir, M_hot, M_disk_gas, M_disk_stars, &
&M_bulge_gas, M_bulge_stars, M_res, M_bh, R_disk, R_res, rb, M_ej
real(dp) :: M_o, dz, j_d, m_d, m_r, m_b, mass, mass0, err, &
&htry, z_t, Mgas, Mstars, Mv, Mbh, Mtot, Mbulge, Mdisk, f_h, eps_sf_bulge, eps_SN_b, eps_SN_d, Clump
integer(I4B) :: i, nn, ndum, unit_test1, unit_test2, unit_test3, unit_evol, tid
character(len=50) :: mhalo_string, eps_SN_b_string, eps_SN_d_string, &
&eps_sf_bulge_string, alpha_string, Aedd_string, f_h_string, Clump_string

100 format(10(1X,E23.16))

!$omp parallel default(none) shared(chi2,f_h,&
& eps_sf_bulge, eps_SN_b, eps_SN_d, Clump,x,eps_SN_b_string, eps_SN_d_string, &
&eps_sf_bulge_string, alpha_string, Aedd_string, f_h_string, Clump_string,&
&unit_test1, unit_test2, unit_test3, ndum) &
&private(redshift, M_vir, R_vir, V_vir, M_hot, M_disk_gas, M_disk_stars, &
&M_bulge_gas, M_bulge_stars, M_res, M_bh, R_disk, R_res, rb, M_ej, M_o, dz,&
&j_d, m_d, m_r, m_b, mass, mass0, err, htry, z_t, Mgas, Mstars, Mv, Mbh, &
&Mtot, Mbulge, Mdisk,mhalo_string,nn,i,unit_evol, tid)


!$omp single

write(*,*) "no. of threads = ", OMP_GET_NUM_THREADS()

chi2=0.d0

f_h=abs(x(1))
eps_sf_bulge=abs(x(2))
eps_SN_b=abs(x(3))
eps_SN_d=abs(x(4))
Clump=abs(x(5))

write(eps_SN_b_string,'(E10.3)') eps_SN_b
write(eps_SN_d_string,'(E10.3)') eps_SN_d
write(eps_sf_bulge_string,'(E10.3)') eps_sf_bulge
write(alpha_string,'(E10.3)') alpha
write(Aedd_string,'(E10.3)') Aedd
write(f_h_string,'(E10.3)') f_h
write(Clump_string,'(E10.3)') Clump

unit_test1=1
unit_test2=2
unit_test3=3

open(unit=unit_test1,file='tests1_f_h_'//trim(adjustl(f_h_string))//'_eps_sf_bulge_'//trim(adjustl(eps_sf_bulge_string))//&
&'_eps_SN_b_'//trim(adjustl(eps_SN_b_string))//'_eps_SN_d_'//trim(adjustl(eps_SN_d_string))//'_clump_'//trim(adjustl(Clump_string))//'.txt')
open(unit=unit_test2,file='tests2_f_h_'//trim(adjustl(f_h_string))//'_eps_sf_bulge_'//trim(adjustl(eps_sf_bulge_string))//&
&'_eps_SN_b_'//trim(adjustl(eps_SN_b_string))//'_eps_SN_d_'//trim(adjustl(eps_SN_d_string))//'_clump_'//trim(adjustl(Clump_string))//'.txt')
open(unit=unit_test3,file='tests3_f_h_'//trim(adjustl(f_h_string))//'_eps_sf_bulge_'//trim(adjustl(eps_sf_bulge_string))//&
&'_eps_SN_b_'//trim(adjustl(eps_SN_b_string))//'_eps_SN_d_'//trim(adjustl(eps_SN_d_string))//'_clump_'//trim(adjustl(Clump_string))//'.txt')

write(unit_test1,'(a25,E23.16)') "# eps_SN_b = ", eps_SN_b
write(unit_test1,*)
write(unit_test2,'(a25,E23.16)') "# eps_SN_b = ", eps_SN_b
write(unit_test2,*)
write(unit_test3,'(a25,E23.16)') "# eps_SN_b = ", eps_SN_b
write(unit_test3,*)

!$omp end single

!$omp do

do nn=8,13

tid=OMP_GET_THREAD_NUM()
unit_evol=tid+4


M_o=10.d0**nn

write(mhalo_string,'(E10.3)') M_o

open(unit=unit_evol, file='mhalo_'//trim(adjustl(mhalo_string))//'_eps_SN_b_'//trim(adjustl(eps_SN_b_string))//&
&'_eps_SN_d_'//trim(adjustl(eps_SN_d_string))//'_eps_sf_bulge_'//trim(adjustl(eps_sf_bulge_string))&
&//'_alpha_'//trim(adjustl(alpha_string))//'_Aedd_'//trim(adjustl(Aedd_string))//'_f_h_'&
&//trim(adjustl(f_h_string))//'.txt')

write(unit_evol,'(a25,E23.16)') "# M_o = ", M_o
write(unit_evol,*)

dz = (zmax-zmin)/(n-1)
do i=1,n
redshift(i) = zmax-(i-1)*dz
M_vir(i) = M_o * ( 1.d0 - (log10(1.d0+redshift(i)))/(log10(1.d0+z_form)) )**(1.d0/a_mah) !PARAMETRIZED MAH FROM vdB.01
end do

write(*,*) "Halo mass = ", M_o

z_t = 120.d0/log10(M_o) - 9.d0

write(*,*) "Transition at z = ", z_t


M_hot(1) = f_b*M_vir(1) !HOT HALO GAS
M_disk_gas(1) = seed_mass !COLD DISK GAS
M_disk_stars(1) = seed_mass !DISK STARS
M_bulge_gas(1) = seed_mass !COLD BULGE GAS
M_bulge_stars(1) = seed_mass !BULGE STARS
M_res(1) = seed_mass !CIRCUMNUCLEAR RESERVOIR GAS
M_bh(1) = seed_mass !BLACK HOLE MASS
R_vir(1)=virial_rad(M_vir(1),redshift(1))
V_vir(1)= sqrt(G*M_vir(1)*m_sun/(R_vir(1)*1.d3*parsec))
m_d = (M_disk_stars(1) + M_disk_gas(1))/M_vir(1)
m_b = (M_bulge_gas(1) + M_bulge_stars(1))/M_vir(1)
j_d = m_d
m_r = M_res(1)/M_vir(1)
rb(1) = rbulge(M_bulge_stars(1) + M_bulge_gas(1))
v_vir(1) = sqrt(G*M_vir(1)*m_sun/(R_vir(1)*1.d3*parsec)) !virial velocity [m/s]
v_vir(1) = v_vir(1)/1.d3 ! km/s
R_res(1) = alpha*G*M_bh(1)*m_sun/(V_vir(1)*1.d3)**2 ! m
R_res(1) = R_res(1)/(1.d3*parsec) ! kpc

call scale_radius(M_vir(1), redshift(1), m_d, j_d, m_b, rb(1), m_r, R_res(1), lambda, R_disk(1))

M_ej(1)=0.d0

write(unit_evol,'(a130)') '# 1.redshift, 2.M_vir, 3.M_hot, 4.M_disk_gas, 5.M_disk_stars, 6.M_bulge_gas, 7.M_bulge_stars, 8.M_res, 9.M_bh, 10. errors'

mass0=M_hot(1)+M_ej(1)+M_res(1)+M_disk_gas(1)+M_disk_stars(1)+&
&M_bulge_gas(1)+M_bulge_stars(1)+M_bh(1)

htry=-accuracy*dz

write(unit_evol,100), redshift(1), M_vir(1), M_hot(1), M_disk_gas(1), M_disk_stars(1), M_bulge_gas(1), &
&M_bulge_stars(1), M_res(1), M_bh(1), 0.d0


do i=2,n

call evolve_masses_adapt(redshift(i-1), M_vir(i-1), redshift(i), M_vir(i), &
& M_hot(i-1), M_disk_gas(i-1), M_disk_stars(i-1), M_bulge_gas(i-1), M_bulge_stars(i-1), &
& M_res(i-1), M_bh(i-1), M_ej(i-1), &
& M_hot(i), M_disk_gas(i), M_disk_stars(i), M_bulge_gas(i), M_bulge_stars(i), &
& M_res(i), M_bh(i), M_ej(i), accuracy, htry, z_t, lambda, &
& f_h, eps_sf_bulge, eps_SN_b, eps_SN_d, Clump)

mass=max(M_hot(i),0.d0)+max(M_ej(i),0.d0)+max(M_res(i),0.d0)+max(M_disk_gas(i),0.d0)&
&+max(M_disk_stars(i),0.d0)+max(M_bulge_gas(i),0.d0)+max(M_bulge_stars(i),0.d0)+max(M_bh(i),0.d0)

err=(mass-(mass0+f_b*(M_vir(i)-M_vir(1))))


write(*,*) 'redshift= ', redshift(i), ' mass= ', M_vir(i)
write(*,*) 'i=', i

write(unit_evol,100), redshift(i), M_vir(i), M_hot(i), M_disk_gas(i), M_disk_stars(i), max(M_bulge_gas(i),0.d0), &
&M_bulge_stars(i), M_res(i), M_bh(i), err

end do


close(unit_evol)

Mgas=M_bulge_gas(n)+M_disk_gas(n)+M_res(n)!+M_hot(n)
Mstars=M_bulge_stars(n)+M_disk_stars(n)
Mbulge=M_bulge_stars(n)+M_bulge_gas(n)
Mdisk=M_disk_stars(n)+M_disk_gas(n)
Mv=M_vir(n)
Mbh=M_bh(n)
Mtot=Mbulge+Mdisk
! Mtot=Mv+Mgas+Mstars+Mbh

write(unit_test1,'(3(1X,E23.16))') log10(Mv), Mstars/(f_b*Mv), Mbulge/Mtot
write(unit_test2,'(3(1X,E23.16))') log10(Mstars), log10(Mgas/Mstars), log10(Mv/Mstars)
write(unit_test3,'(2(1X,E23.16))') log10(Mbulge), log10(Mbh)

!$omp critical
chi2=chi2+((Mstars/(f_b*Mv)-Mstars_over_f_b_Mv(Mv))/Mstars_over_f_b_Mv(Mv))**2+&
&+((log10(Mgas/Mstars)-log10_Mgas_over_Mstars(Mstars))/err_log10_Mgas_over_Mstars(Mstars))**2&
&+((log10(Mv/Mstars)-log10_Mv_over_Mstars(Mstars))/err_log10_Mv_over_Mstars(Mstars))**2&
&+((log10(Mbh)-log10_Mbh(Mbulge))/err_log10_Mbh(Mbulge))**2&
&+((Mbulge/Mtot-Mbulge_over_Mtot(Mv))/err_Mbulge_over_Mtot(Mv))**2

!$omp end critical

end do

!$omp end do

!$omp single

close(unit=unit_test1)
close(unit=unit_test2)
close(unit=unit_test3)

!$omp end single

!$omp end parallel

end function
enrico
 
Posts: 8
Joined: Wed Jul 16, 2008 3:32 pm

Re: newbie probs

Postby djo35 » Fri Jul 18, 2008 6:59 am

Could you tell us where the divide by zero happens? I'm no Fortran guru so that would help speed things up...

Also, some people may be put off by vast tracts of dense code in posts. I often find that short concise snippets get much faster and more helpful replies (and often whilst I'm thinking about the snippet I manage to solve my own problem!)

Dan
djo35
 
Posts: 11
Joined: Wed May 28, 2008 8:24 am

Re: newbie probs

Postby enrico » Fri Jul 18, 2008 7:48 am

Thanks for the reply, but I think I've already tracked down the problem. The serial version of the code was running alright, but then I tried to run it inside the debugger and it was complaining about some overflow exception (which was not signaled when running outside the debugger in spite of compiling with the -fpe0 -traceback options and which was not giving any NaN values in the printed outputs). Then, I re-compiled with 128 bit reals and the problem disappeared both in the serial and parallel version. So the problem was not related to openmp, although it seems that somehow openmp makes the program more "sensitive" to overflow. this seems a bit weird to me.

cheers

enrico
enrico
 
Posts: 8
Joined: Wed Jul 16, 2008 3:32 pm

Re: newbie probs

Postby djo35 » Fri Jul 18, 2008 8:02 am

Hey Enrico, I'm glad it's working out for you now.

Just a musing - In OMP each thread has it's own private memory area as well as a shared area (ejd correct me if I'm wrong). If we increase the number of threads then we'll reduce the total spare memory (as all threads share the same total space). That means that if there's a small memory leak it could be repeated n thread times (each time soaking up more memory) *or* one threads leak tramples it's neighbours memory and we get chaos. What ever happens, it seems like the more threads there are, the more damage memory tramples / leaks can do!

Dan
djo35
 
Posts: 11
Joined: Wed May 28, 2008 8:24 am

Re: newbie probs

Postby ejd » Mon Jul 21, 2008 8:14 am

Limiting the program down is always helpful - especially if you can give a version that can be run. Next best, is to give something that can at least be compiled. At least then, static tools can be used on the code. However, a lot of times this is difficult to do. In any case, I am glad that you found your problem. It could be that there is a problem in the implementation of OpenMP that you are using that is causing the problem to show up more, or it could be because of the difference caused by roundoff in a reduction operation, or .... There are quite a number of possible reasons and the only way to tell is to take the application and work through it and find the "problem" causing the difference. You didn't say what hardware and software you are using. If any of it is from Sun, then I am definitely interested in talking to you more. Otherwise, you can decide if you need to discuss it more here or with your vendor.

As for djo35's question - you are correct. If the program had a problem originally, then making it parallel, usually only increases the problem (in my experience). One of the most common complaints I have heard is, it worked when I ran it sequentially and fails when I run it in parallel - so parallel doesn't work and was a waste of time. That is one of the reasons I like OpenMP. In most cases, you can keep the original program as it was so you can go back and see if it really worked. Better than 9 times out of 10, the problem was always there.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am


Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot], Yahoo [Bot] and 10 guests