I got 3 exercises to do, all related to OpenMP, yet instead of spamming forum index with 3 threads at once, i'll post them one by one - after the previous has been done the next will be posted. Thanks in advance for any help and for your time.

Description:

Loops:Write and run two C/OpenMP programs for adding elements of a square matrix a.

Implement two versions of loops as shown on this page.

The value of n should be 100 *(number of threads).

Time both codes. Which of the two versions runs faster. Explain why?

- Code: Select all
`1)`

for (int j=0; j<n; j++)

for (int i=0; i<n; i++)

sum += a[i][j];

2)

for (int i=0; i<n; i++)

for (int j=0; j<n; j++)

sum += a[i][j];

I'm using Win x86 64bit and omp library with codeblocks with mingw gcc compilator.

- Code: Select all
`#include <stdio.h>`

#include <stdlib.h>

#include <time.h>

#include <omp.h>

int** getarray(int nrows)

{

int ncolumns = nrows, i;

int **array;

array = malloc(nrows * sizeof(int *));

if(array == NULL)

return;

for(i = 0; i < nrows; i++) {

array[i] = malloc(ncolumns * sizeof(int));

if(array[i] == NULL)

return;

}

return array;

}

int main (int argc, char *argv[]) {

int nthreads, n,sum=0;

int i,j,**a;

srand (time(NULL));

time_t start, end;

time(&start);

#pragma omp parallel private(i,j) shared(a, sum, nthreads)

{

#pragma omp single

{

nthreads = omp_get_num_threads();

n = 100*nthreads;

a = getarray(n);

}

/* Initializing matrix */

#pragma omp for

for(i=0; i<n; i++)

for(j=0; j<n; j++)

a[i][j] = rand()%100;

printf("Thread %d starting...\n",omp_get_thread_num());

#pragma omp single

time(&start);

#pragma omp for reduction(+:sum)

for (j=0; j<n; j++)

for (i=0; i<n; i++)

sum += a[i][j];

} /* end of parallel region */

time(&end);

printf("\nMethod \"a\". Elapsed time: %f; sum: %d threads %d\n", difftime(end, start), sum, nthreads);

return 0;

}

This code features just one option, ofcourse there is another version of program with switched loop - it's pointless to paste it too, you get the point.

Does my approach is correct one? Is the timing done properly? Should initialization of variables and time(&start) be done within "single" clause?

Last, but not least, which and why one of those is faster?

I have been elaborating the subject, at first I thought it's similar to issue with paralleling outer and inner loops. With the first we usually can avoid the overhead, but meaby I miss the point. It seems that the 1st one is faster though.