I have a specific problem of load imbalance and looking for a solution. I'm a physicist and don't want to reinvent the wheel
- Code: Select all
double array[N], sum;
sum = 0.0;
for (i=0; i<N; i++) {
for (j=0; j<i; j++) {
sum += array[i] * array[j]
}
as you can see, index i does (i-1) multiplications. So it's longer for higher i. My current solution is interleaving the outer loop with "schedule(static,1)". This helps certainly, but is not perfect, and not enough data locality (index i = N-1 have to read the whole array, possibly thrashes the cache.)
Does anybody knows a better solution to this? thank you
