Questions about "omp teams" and "omp distribute"

Forum for the public review of the OpenMP 4.0 API Release Candidates. (Read Only)
Forum rules
This forum is now closed.

Questions about "omp teams" and "omp distribute"

Postby Leo » Fri Mar 29, 2013 10:20 am

Dear committee,

I have some questions and comments about the teams construct introduced in RC2.

It looks like the "omp teams" construct is trying to divide up the threads on accelerators into different teams.
And "omp distribute" is used to fine tune the scheduling of loop iterations among
These two constructs are the major additions to the accelerator support compared to the previous technical report (TR1).

Question 1:
---------------
Will the following code example roughly based on TR1 still be valid in RC2?

#pragma omp target device (gpu0) map(fromto: y[0:n]) map(to: x[0:n],a,n)
#pragma omp parallel for shared(x, y, n, a) private(i)
for (i = 0; i < n; ++i)
y[i] += a * x[i];
}

Question 2:
---------------
Can you provide a few examples in the Appendix A to show case how users can use "omp target", "omp teams" , and "omp distribute" together to port a loop to GPU?

From a user's point of view, it is best to have an easy migration path from traditional OpenMP code to accelerator code .
My fear is that with "omp teams" and "omp distribute", users have to replace their favorite "omp for" with something totally alien.

Comment 1:
---------------
I feel like it is too early to introduce thread teams on accelerators when the support for a single team is still being defined.
Could you share what the urgent needs are to define multiple teams on accelerators?

My understanding is that, the threads on host (CPUs) are mostly managed as a single team, though nested teams are allowed. Thread subteams on CPUs have been discussed for quite a while but are still not formally introduced.

Comment 2:
--------------------
For accelerator support, I think the number one issue is the lack of a clear canonical architecture model. Part of the success of OpenMP so far relies on its simple, generic SMP architecture model: identical cores/processors attached to a single shared memory. Users and compiler developers can immediately get the big picture and work together.

What is the generic architecture for accelerators? Without a generic architecture for accelerators in mind, it is very hard for users to program and compiler developers to implement the accelerator directives.

It is interesting that the OpenMP specification has sections about execution model and memory model. But the assumptions about the targeted hardware architectures are not really articulated. It may not be necessary when OpenMP tries to support the simple SMP machines. But it may become necessary when dealing with complex types of accelerators.

------------------------
Thanks for your attention!

Leo
Leo
 
Posts: 4
Joined: Tue Mar 11, 2008 5:25 pm

Re: Questions about "omp teams" and "omp distribute"

Postby james » Tue Apr 09, 2013 10:19 am

Leo,

First let me say that anything that was legal with TR1 should remain legal with RC2.
Second, I am one of the co-chairs of the subcommittee responsible for this stuff.

I have tried to respond to your questions inline.

Leo wrote:
Question 1:
---------------
Will the following code example roughly based on TR1 still be valid in RC2?

#pragma omp target device (gpu0) map(fromto: y[0:n]) map(to: x[0:n],a,n)
#pragma omp parallel for shared(x, y, n, a) private(i)
for (i = 0; i < n; ++i)
y[i] += a * x[i];
}

This code is legal and will work as expected on a Intel PHI, however on a GPU the implementation will likely confine the parallel to a single block, ThreadBlock in Nvidia terms. This means that the code may be leaving significant performance opportunities behind.

Leo wrote:Question 2:
---------------
Can you provide a few examples in the Appendix A to show case how users can use "omp target", "omp teams" , and "omp distribute" together to port a loop to GPU?

Yes, we are working on examples.

Leo wrote:From a user's point of view, it is best to have an easy migration path from traditional OpenMP code to accelerator code .
My fear is that with "omp teams" and "omp distribute", users have to replace their favorite "omp for" with something totally alien.
[\quote]
Only users who want to exploit the power and performance of GPU like devices will have to use the teams and distribute construct. However, even host codes could likely benefit from these new constructs if the implementation properly exploits the extra knowledge the provide.

Leo wrote:Comment 1:
---------------
I feel like it is too early to introduce thread teams on accelerators when the support for a single team is still being defined.
Could you share what the urgent needs are to define multiple teams on accelerators?

My understanding is that, the threads on host (CPUs) are mostly managed as a single team, though nested teams are allowed. Thread subteams on CPUs have been discussed for quite a while but are still not formally introduced.

See my response to question 1 as to why we felt we needed to add contention groups and thread groups. With out teams OpenMP would be ignoring the current leading group of accelerators, or devices, GPUs. The teams construct does not create subteams it creates "new" teams. Think of a team as an entirely new OpenMP instance everything gets reset and the team works completely independently from the spawning thread and its team.

Leo wrote:Comment 2:
--------------------
For accelerator support, I think the number one issue is the lack of a clear canonical architecture model. Part of the success of OpenMP so far relies on its simple, generic SMP architecture model: identical cores/processors attached to a single shared memory. Users and compiler developers can immediately get the big picture and work together.

What is the generic architecture for accelerators? Without a generic architecture for accelerators in mind, it is very hard for users to program and compiler developers to implement the accelerator directives.

It is interesting that the OpenMP specification has sections about execution model and memory model. But the assumptions about the targeted hardware architectures are not really articulated. It may not be necessary when OpenMP tries to support the simple SMP machines. But it may become necessary when dealing with complex types of accelerators.


OpenMP continues to assume a shared memory model. The only thing that the target construct does is allow a tunnel from one device type to another device type; here device type can be thought of as hardware type.

I hope this helps.

james
james
 
Posts: 53
Joined: Fri May 16, 2008 9:27 am


Return to OpenMP 4.0 Public Review Release Candidates

Who is online

Users browsing this forum: No registered users and 1 guest