copyin clause with ALLOCATABLE arrays

Discuss the OpenMP 3.0 API Specifications with the OpenMP Arch. Review Board. (Read Only)

copyin clause with ALLOCATABLE arrays

Postby nathanweeks » Fri Nov 13, 2009 7:34 pm

OpenMP 3.0, p. 102, lines 17-8, lists the following Fortran restriction
for the copyin clause:

An array with the ALLOCATABLE attribute must be in the allocated state.
Each thread's copy of that array must be allocated with the same bounds.


This seems to imply that it's the programmers responsibility to allocate the
memory for each thread's copy of an allocatable array, whereas lines p. 101,
lines 22-24, state:

On entry to any parallel region, each thread’s copy of a variable that is
affected by a copyin clause for the parallel region will acquire the
allocation, association, and definition status of the master thread’s copy...


which seems to imply that the OpenMP implementation is responsible.
Could I have clarification regarding this? Thanks.
--
Nathan Weeks
Iowa State University HPC Group
http://weeks.public.iastate.edu/
nathanweeks
 
Posts: 41
Joined: Sun May 17, 2009 6:19 am
Location: Iowa State University

Re: copyin clause with ALLOCATABLE arrays

Postby mwolfe » Fri Nov 20, 2009 12:33 pm

This points out some of the differences between Fortran ALLOCATABLE and Fortran POINTER. In many compilers, they are implemented using essentially the same mechanism (some kind of descriptor or "dope vector"), but the semantics are perhaps subtly different.

Page 101 states that if a variable has the POINTER attribute, the association status (associated or disassociated) will be copied from the master to each thread, and if associated, each thread's copy will be associated with the same target. This is probably clear.

The top of page 102 states that for any other variable (non-POINTER), each copy becomes defined with the value of the master thread's copy. For ALLOCATABLE, the question the committee struggled with is what to do if you have a master thread which has allocated an array of size 10, and you have three worker threads, where one thread has allocated the array of the 10, one has allocated it only of size 5, and a third has not allocated the array at all. Should the implementation reallocate all the arrays and copy the data? If the implementation does the allocation, at what point do they get deallocated? Or, if the programmer allocated the array to size 5, should the implementation only copy the first 5 elements?

In Fortran 90 and 95, ALLOCATABLE array allocation and deallocation is fully under control of the programmer. It would be wrong for a compiler to insert allocates, because there would be no place to put the matching deallocate. So, for OpenMP 3.0, the rules are as stated: the ALLOCATABLE arrays must be allocated by the programmer to the correct size.

When OpenMP moves to Fortran 2003, the rules change; assignment to an allocatable array in Fortran 2003 implies checking whether the array is allocated and is of the right size, and if not, reallocating the array. With the F2003 rules, OpenMP can change its copyin rules to allow the implicit allocate and matching implicit deallocation according to the language.

Let me know if this still doesn't help.

-Michael Wolfe
mwolfe
 
Posts: 54
Joined: Mon Aug 25, 2008 3:19 pm

Re: copyin clause with ALLOCATABLE arrays

Postby nathanweeks » Sat Nov 21, 2009 9:47 am

Thanks for the informative response. If I understand, I suppose this means that for
a program that has an ALLOCATABLE array in a copyin clause to be conforming, each
thread must have allocated its threadprivate copy in a previous parallel region, and
that previous parallel region and the parallel region containing the copyin clause
(and any parallel regions in between the two) must satisfy the constraints listed on
p. 82, lines 12-15? e.g.:

Code: Select all
program simple_alloc_copyin
   use omp_lib

   integer, allocatable, save :: A(:)
!$omp threadprivate(A)

   call omp_set_dynamic(.false.)
   call omp_set_num_threads(4)

!$omp parallel
   allocate(A(3))
!$omp end parallel

   A = (/1,2,3/)

!$omp parallel copyin(A)
   print *, omp_get_thread_num(), ':', A
!$omp end parallel

end program simple_alloc_copyin


If this is the case, a simple example in the next version of the OpenMP API spec illustrating how
to use this new OpenMP 3.0 feature would be helpful (though perhaps unnecessary if that future
version used the Fortran 2003 semantics?).

If not, please clarify, and thanks for bearing with me, as Fortran "isn't my native tongue" ;)
--
Nathan Weeks
Iowa State University HPC Group
http://weeks.public.iastate.edu/
nathanweeks
 
Posts: 41
Joined: Sun May 17, 2009 6:19 am
Location: Iowa State University

Re: copyin clause with ALLOCATABLE arrays

Postby mwolfe » Tue Dec 01, 2009 1:54 pm

Nathan: You're analysis is correct, and a good example is another good idea. Thanks.
-mw
mwolfe
 
Posts: 54
Joined: Mon Aug 25, 2008 3:19 pm

Re: copyin clause with ALLOCATABLE arrays

Postby shiv4k » Thu May 27, 2010 7:32 pm

¿Why this solution don`t work in a nested parallel region? Allocates don´t persist between regions in this case...

Code: Select all
program simple_alloc_copyin
   

   use omp_lib
   integer, allocatable, save :: A(:)
   !$omp threadprivate(A)

   call omp_set_num_threads(2)

   ALLOCATE(A(2))

   call omp_set_nested(.TRUE.)
   call omp_set_dynamic(.FALSE.)

   !$omp parallel

      !$omp parallel num_threads(2)
         if(.NOT.allocated(A))allocate(A(2))
      !$omp end parallel

      !$omp parallel   
         if(.NOT.allocated(A))print *, 'not allocated!!!'
      !$omp end parallel

      !$omp parallel copyin(A)
         print *, omp_get_thread_num(), ':', A
      !$omp end parallel


   !$omp end parallel

end program simple_alloc_copyin


Code: Select all

not allocated!!!
not allocated!!!
Segmentation fault
shiv4k
 
Posts: 21
Joined: Fri Apr 17, 2009 10:40 am

Re: copyin clause with ALLOCATABLE arrays

Postby ejd » Fri May 28, 2010 4:25 am

This looks like it is possibly a compiler/run-time problem. What OS, compiler, and compiler version are you using?
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: copyin clause with ALLOCATABLE arrays

Postby shiv4k » Fri May 28, 2010 7:41 am

Using: gcc version 4.4.4 (Debian 4.4.4-1) Target: x86_64-linux-gnu

Also tested in Ubuntu64
shiv4k
 
Posts: 21
Joined: Fri Apr 17, 2009 10:40 am

Re: copyin clause with ALLOCATABLE arrays

Postby shiv4k » Sun May 30, 2010 5:21 pm

Asking on gcc maillist, I`ve received this answer:

See OpenMP 3.0 spec, 2.9.2, page 82, lines 9-18.
The guarantee that you are looking at the same thread is there only for
parallels not nested in another parallel, with nested parallels there is no
such guarantee. Note that you use num_threads(2) on the first nested parallel,
so even if the outer parallel is removed, the program would be guaranteed to
work only if it decides to use just 2 threads (say with OMP_NUM_THREADS=2
etc.).


Because of this, the solution in this thread wouldn`t be valid in a nested region, so COPYIN cannot be used in a nested parallel directive.

Is there any other solution to copy some allocatable threadprivates in a nested zone? It seems that it is not possible... Any ideas?
shiv4k
 
Posts: 21
Joined: Fri Apr 17, 2009 10:40 am

Re: copyin clause with ALLOCATABLE arrays

Postby ejd » Tue Jun 01, 2010 10:46 pm

The reason I said that it was possibly a compiler/run-time problem is because there is one compiler that this works with - though it isn't gcc. The answer you received from the gcc folks is correct. The OpenMP V3.0 spec doesn't allow it. I should have been more specific with my first response. Sorry about that. There was a great deal of discussion during the Version 3.0 work on this issue and it was decided that it needed more investigation before implementations could move forward. The problem is associating the threadprivate variables with the correct thread when you are using a thread pool to distribute the work.

Since most compilers don't support threadprivate in nested parallel regions, I am afraid that the only solution is to do it yourself (if you really need it). Basically you have to set up the variables yourself and then only use them from the nested regions with the appropriate thread. Arrays would be the most natural since you could use the nesting level and thread number to access an element.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am


Return to OpenMP 3.0 API Specifications

Who is online

Users browsing this forum: No registered users and 2 guests