Thanks to MarkB for the prompt and smart answer.
The solution you propose for n=2 nested loops is great
. Although that is true, it should be slightly modified in order to work properly, i.e. index "i" should be dealt with a bit differently lest it would produce fractionary figures, see the attached spreadsheet. Assuming I have implemented it correctly.
After saying that, I have the gut feeling -- only guts remain to me to tackle these problems -- that the case of n=2 nested loops is one on its own. Again, consider the Mickey Mouse example in the spreasheet for n=3. It is easy to spot a kind of cyclical behavior of indexes triples which I find personally difficult to express in closed forms. The alternative would be using some conditional branching but I believe it becomes cumbersome for a growing n, a sort of curse of dimensionality.
Anyway, the questions remain:
1) how does OpenMP directive collapse operate a nested loops transformation ?
I guess that developers of OpenMP Parallel programming model do know what they have entailed into implementation 3 and above. I would be very grateful to them if they could tell me what it is actually doing the collapse directive so than I can set order in my mind between these two strands of literature which strive towards optimizing nested loops, i.e. the algorithmic and the compilers optimization ones.
2) The example by MarkB is very interesting, but even more interesting would be to read some references about the general closed forms to generate n-uples of indexes in the sort of one collapsed loop transformation MarkB suggests for the case n=2.
Do they exist and they have been published or are they a well kept secret ?