1- PAGE 244: A data race condition exists when two threads may concurrently access the same shared variable between synchronization points, without holding any common locks and with at least one thread modifying the variable. The order of these accesses is nondeterministic. The thread reading the value might get the old value or the updated one, or some other erroneous value if the update requires more than one store operation. This usually leads to indeterministic behavior, with the program producing different results from run to run.
2- PAGE 271: "VERIFICATION OF THE SEQUENTIAL VERSION"[...] Run the loops parallelized with OpenMP backwards. If the result is wrong, the loop(s) cannot be executed in parallel. The reverse is not true. If the result is okay, it does not automatically mean the loop can be parallelized.
3-PAGE 272: "VERIFICATION OF THE PARALLEL CODE": [...]It is also good practice to find the lowest compiler optimization level for which the bug occurs. [...]Several scenarios are now worth exploring:
• Run the OpenMP version of the program on one thread. If the error shows up then, there is most likely a basic error in the code.
• Selectively enable/disable OpenMP directives to zoom in on the part of the program where the error originates.
• Check that the libraries used are thread-safe in case one or more of their functions are called within a parallel region.
Now with this considerations I want to ask you some doubts about a for loop that I'm trying to parallelize:
1- In my for loop there are other two for loops in which essentially the program accesses to some arrays to read data and writes on two arrays. I listed all the variables in the for loop, and these are the results:
- - the only variables modified by the loop are two: the first one, say it "mpts_sfrt_a" is an array used for keeping some support data that are read from another array and written on "mpts_sfrt_a". "mpts_sfrt_a" is initialized outside the parallel region, so I declared it firstprivate, because otherwise it would be undefined. I declared it firstprivate to avoid a possible data race condition, because if it was shared, multiple threads would write to same memory locations of this array.
- The second variable modified inside the loop is a matrix to be filled and used for subsequent computations outside the parallel region. This matrix is implemented row wise as an array and I'm pretty sure that each thread writes on different memory locations. The matrix is accessed via the indexes of the for loops. The two parallel threads I use get the two chunks of the row indexes of the matrix (which are the indexes of the most external loop, the parallel one): the first one gets the first chunk and the second one gets the second chunk, so every thread accesses to a distinct row of the matrix.
- The other variables are all used in read mode (i.e. they are only accessed for reading), or they are local variables created inside the parallel region. Can in this situation occur a data race condition? The book says that data race conditions can happen only if the threads modify a variable; therefore if two thread read the same memory location concurrently there shouldn't be any problem, right??
3- I'm using no optimization to try and find the bug.
Am I saying something wrong? Because based on what you say in your book the loop should work, but it doesn't
Here is the code of the loop:
- Code: Select all
#pragma omp parallel num_threads(2) default(shared)
{
#pragma omp for firstprivate(mpts_sfrt_a)
for (int i = 0; i < (int)cxs_number; i++) {
{
for (int j = 0; j < (int)cys_number; j++) {
//cout << "i = " << i << " j = " << j << endl;
int p = 0;
//#pragma omp for
//#pragma omp critical (write_)
//{
//cout << "i = " << i << " j = " << j << endl;
double cx = cxs_a[i] - 1; //
double cy = cys_a[j] - 1;
for (int k = 0; k < (int)cols_number; k++) { // problem is probably here but not sure
mpts_sfrt_a[k] = mpts_sfr_a[k] + cx;
mpts_sfrt_a[k + cols_number] = mpts_sfr_a[k + cols_number] + cy;
}
//}
double corr_value = 0;
int valid_index = 0;
for (int l = 0; l < (int)cols_number; l++) {
int x = (mpts_sfrt_a[l]);
int y = (mpts_sfrt_a[l + cols_number]);
bool valid = (x >= 1) && (x <= gradmod->cols) && (y >= 1) && (y <= gradmod->rows);
if (!valid) continue;
int xyind = (y - 1) * gradmod->cols + (x - 1);
double gradmod_value = (double) gradmod_a[xyind];
double arg0 = graddir_a[xyind] - dirsmc_a[l];//graddir_ptr[num1 - 1] - dirsmc_ptr[l];
corr_value += gradmod_value * fabs(cos(arg0));
if (gradmod_value > 0) {
valid_index++;
}
}
double val = corr_value * valid_index / ( cols_number * cols_number);
corr[i * (int)cys_number + j] = corr_value * valid_index / ( cols_number * cols_number);
}
}
}
