the same as safelen (N), or safelen (0) (in the latter case the compiler would be responsible for finding out whether there are any possible overlaps and whether it e.g. needs to do runtime versioning for aliasing or not, as it does normally
during autovectorization when it can't prove there is no overlap)?
2) What exactly is aligned when a var is specified in aligned clause?
- Code: Select all
extern int *p, q, r;
foo (int *s)
#pragma omp simd aligned (p, q, r, s : 8 * sizeof (int))
for (int i = 0; i < 1024; i++)
r[i] = p[i] + q + s[i];
Can the compiler assume that (((uintptr_t) p) & 15) == 0, or (((uintptr_t) &p) & 15) == 0? For q, being a non-pointer scalar, I bet it doesn't make sense to talk about the alignment (should the standard forbid specifying it in aligned clause?),
but if specified validly the only possibility in that case is (((uintptr_t) &q) & 15) == 0. For r I guess (((uintptr_t) &r) & 15) == 0, the only possibility, and for s there are again two possibilities like for p.
For vectorization I guess the first alternative for p (and similarly for s), and the only alternative for r is what is useful, but I'd say the standard should have detailed wording on it in that case.
3) aligned clauses without specified alignment - what are other compilers intending to do here and what are users expected to use in their code to align vars, especially on targets with multiple vector sizes?
Say on i?86/x86_64, where (currently) we have 16-byte and 32-byte vectors with AVX and later, some loops can be with -mavx (or -mavx2 etc.) better vectorized using 32-byte vectors, some using 16-byte vectors (e.g. with -mavx
when AVX2 isn't available, if the loop needs to do integer math, it needs to use 16-byte vectors, but if it only does float/double math, it can use 32-byte vectors). Would the default aligned alignment be e.g. 16 when -mavx isn't specified
(or similar, i.e. when the CPU which code is generated for can't ever use 32-byte vectors) and 32 when -mavx is specified, or 16 vs. 32 depending on whether vectorizer actually sees whether it could use 32-byte vectors?
Or is the plan say for Intel CC to use 16-byte alignment always? This isn't unfortunately just compiler's private decision, because it requires the user to add alignas(16) vs. alignas(32) somewhere (or aligned attribute or whatever the
compiler allows), or posix_memalign etc. or whatever the language/extensions provide to align variables.