Hi,
Just a small suggestion: On page 40, the static scheduling allows only equi-size chunking. In many cases, it's preferable to have variable size chunking to have better load balance. It would be nice to parameterize the chunk size so that the programmer can define the chunk sizes; the default chunk size would be the total number of iterations divided by the total number of threads.
Thanks,
Arun
