I understand OpenMP 4.0 must target a broad range of accelerators, but I should first say that I will most likely be working with (and thinking here of) GPUs.
The following C++ code shows a loop parallelised across a default accelerator device; say a GPU. I've kept the parallel and for constructs separate here for clarity (though I wonder if a new combined construct could help reduce the verbosity).
- Code: Select all
#pragma omp target map(a[:4096])
#pragma omp teams
#pragma omp distribute
#pragma omp parallel
#pragma omp for
for (int i = 0; i < 4096; ++i)
My first question: Is the order of the parallel and distribute constructs significant? Is the position of the target construct significant?
If I don't use the "declare target" directive on a function which I then attempt to use within the scope of a target construct, should I receive an error, or might the implementation silently fall back to a host implementation?
Following on from the last question: if, say, a target device is not supported by the implementation, is there a way to tell that my target region did/will not run on the accelerator?
On page 46 of the RC2 it is stated for C/C++ "When the size of the array dimension is not known, the length must be specified explicitly." Would a C++11 std::array also be compatible here?
If I omit any use of the map clause, what will be the default map-type for variables referenced within a target region?
In Section 2.9.3 target update, there is reference to a "to" and "from" clauses. I can't find any reference to these clauses elsewhere. Shouldn't it be map that's used here?