Will the following code example roughly based on TR1 still be valid in RC2?
#pragma omp target device (gpu0) map(fromto: y[0:n]) map(to: x[0:n],a,n)
#pragma omp parallel for shared(x, y, n, a) private(i)
for (i = 0; i < n; ++i)
y[i] += a * x[i];
Leo wrote:Question 2:
Can you provide a few examples in the Appendix A to show case how users can use "omp target", "omp teams" , and "omp distribute" together to port a loop to GPU?
Leo wrote:From a user's point of view, it is best to have an easy migration path from traditional OpenMP code to accelerator code .
My fear is that with "omp teams" and "omp distribute", users have to replace their favorite "omp for" with something totally alien.
Only users who want to exploit the power and performance of GPU like devices will have to use the teams and distribute construct. However, even host codes could likely benefit from these new constructs if the implementation properly exploits the extra knowledge the provide.Leo wrote:Comment 1:
I feel like it is too early to introduce thread teams on accelerators when the support for a single team is still being defined.
Could you share what the urgent needs are to define multiple teams on accelerators?
My understanding is that, the threads on host (CPUs) are mostly managed as a single team, though nested teams are allowed. Thread subteams on CPUs have been discussed for quite a while but are still not formally introduced.
See my response to question 1 as to why we felt we needed to add contention groups and thread groups. With out teams OpenMP would be ignoring the current leading group of accelerators, or devices, GPUs. The teams construct does not create subteams it creates "new" teams. Think of a team as an entirely new OpenMP instance everything gets reset and the team works completely independently from the spawning thread and its team.Leo wrote:Comment 2:
For accelerator support, I think the number one issue is the lack of a clear canonical architecture model. Part of the success of OpenMP so far relies on its simple, generic SMP architecture model: identical cores/processors attached to a single shared memory. Users and compiler developers can immediately get the big picture and work together.
What is the generic architecture for accelerators? Without a generic architecture for accelerators in mind, it is very hard for users to program and compiler developers to implement the accelerator directives.
It is interesting that the OpenMP specification has sections about execution model and memory model. But the assumptions about the targeted hardware architectures are not really articulated. It may not be necessary when OpenMP tries to support the simple SMP machines. But it may become necessary when dealing with complex types of accelerators.
Users browsing this forum: No registered users and 1 guest