#pragma omp parallel for simd and *private* clauses

Forum for the public review of the OpenMP 4.0 API Release Candidates. (Read Only)
Forum rules
This forum is now closed.

#pragma omp parallel for simd and *private* clauses

Postby jakub » Thu Apr 25, 2013 6:55 am

2.10 has
"Some combined constructs have clauses that are permitted on both constructs that were
combined. If applying the clause to one construct would result in different program
behavior than applying the clause to the other construct then the program’s behavior is
unspecified."

Does this apply also to #pragma omp parallel for simd? The thing is, private and lastprivate
clauses behave quite differently:
Code: Select all
struct A { A () { printf ("%p ctor %d\n", this, omp_get_thread_num ()); } ~A () { printf ("%p dtor %d\n", this, omp_get_thread_num ()); }
void foo ()
{
  A a;
  int i;
  #pragma omp parallel private (a)
    #pragma omp for simd
      for (i = 0; i < 100; i++)
        printf ("%d %p %d\n", i, &a, omp_get_thread_num ());
}

vs.
Code: Select all
struct A { A () { printf ("%p ctor %d\n", this, omp_get_thread_num ()); } ~A () { printf ("%p dtor %d\n", this, omp_get_thread_num ()); }
void foo ()
{
  A a;
  int i;
  #pragma omp parallel
    #pragma omp for simd private(a)
      for (i = 0; i < 100; i++)
        printf ("%d %p %d\n", i, &a, omp_get_thread_num ());
}


In the first case there should be one private variable for each thread, while in the last one supposedly one for each SIMD lane in each thread. While say for #pragma omp parallel for it doesn't make much difference where to put the private clause, for
#pragma omp parallel for simd IMHO it always matters. So shouldn't the standard specify if private on #pragma omp parallel for simd is meant to apply to #pragma omp for simd or #pragma omp parallel?

BTW, it seems the Cilk+ spec instead of declaring private vars private to each SIMD lane says they are private to each iteration, which is something different (though, as it only allows scalar vars, does that disallow classes and thus make it harder to spot the difference?).

And one implementation question to those who have already implemented this in their compilers, what about loops that can't be vectorized, if private/lastprivate/reduction clauses are present on #pragma omp simd, do you always use simd chunk size of 1 in those cases where observable by user (e.g. through addresses of the private vars and order of reduction operators), or pick up some expected simd chunk size say based on target ISA, compiler options and safelen clause, perhaps with some light loop analysis, stick to that chosen simd chunk size (like, call that many ctors/dtors etc.) and if later on it is discovered SIMD instructions actually can't be used for something, or not as wide as the chosen SIMD chunk, you still keep the initially chosen SIMD chunk?
jakub
 
Posts: 74
Joined: Fri Oct 26, 2007 3:19 am

Return to OpenMP 4.0 Public Review Release Candidates

Who is online

Users browsing this forum: Google [Bot] and 1 guest