On page 7 it only lists for, sections and single for C/C++ and do, sections, single and workshare for Fortran. E.g. the #pragma omp parallel for description clearly documents that it behaves pretty much as
#pragma omp parallel with #pragma omp for closely nested in it, but #pragma omp for simd isn't described similarly, given that it actually isn't a straightforward equivalent of #pragma omp for with #pragma omp simd inside of it.
E.g. in the description of firstprivate clause, it is required that the original list item in worksharing constructs must not be private in outer parallel. That makes sense for worksharing regions, makes sense for omp for simd, but doesn't make sense for omp simd. Or say in section 2.16, where worksharing regions are disallowed from being closely nested in whole bunch of regions, while plain simd regions likely are.
Also, what exactly means chunk-size on schedule clause of #pragma omp for simd? Is the chunk-size still measured in loop iterations (so e.g. if it is odd, we'd need to use scalar iterations), or is it measured in (implementation-defined) simd chunks instead?
- Code: Select all
#pragma omp for simd schedule(static, 5) safelen(4)
for (i = 0; i < 100; i++)
Assuming the body can be vectorized with 4 iterations in simd chunk, does the above mean that the first thread should be assigned 5 simd-chunks (20 iterations), second thread again 20 iterations etc., or just not vectorize, and if schedule(static, 4)
would be used instead, that the first thread would be given the first 4 iterations (one simd-chunk), the second thread next 4 iterations, etc.?