OMP parallel for causes segmentation fault.

General OpenMP discussion

OMP parallel for causes segmentation fault.

Postby MutantTurkey » Tue Apr 30, 2013 11:58 am

I'm having trouble using the #pragma omp parallel for

Basically I have several hundred DNA sequences that I want to run against an algorithm called NNLS.

I figured that doing it in parallel would give me a pretty good speed up, so I applied the #pragma operators.

When I run it sequentially there is no issue, the results are fine, but when I run it with #pragma omp parallel for I get a segfault within the algorithm (sometimes at different points).

The segfault is within the "nnls" function.

Code: Select all
#pragma omp parallel for
for(int i = 0; i < dir_count; i++ ) {

  int z = 0;
  int w = 0;
  struct dirent *directory_entry;
  char filename[256];

  directory_entry = readdir(input_directory_dh);

  if(strcmp(directory_entry->d_name, "..") == 0 || strcmp(directory_entry->d_name, ".") == 0) {
    continue;
  }

  sprintf(filename, "%s/%s", input_fasta_directory, directory_entry->d_name);

  double *count_matrix = load_count_matrix(filename, width, kmer);

  //normalize_matrix(count_matrix, 1, width)
  for(z = 0; z < width; z++)
    count_matrix[z] = count_matrix[z] * lambda;

  // output our matricies if we are in debug mode
  printf("running NNLS on %s, %d, %d\n", filename, i, z);
  double *trained_matrix_copy = malloc(sizeof(double) * sequences * width);
  for(w = 0; w < sequences; w++) {
    for(z = 0; z < width; z++) {
      trained_matrix_copy[w*width + z] = trained_matrix[w*width + z];
    }
  }

  double *solution = nnls(trained_matrix_copy, count_matrix, sequences, width, i);


  normalize_matrix(solution, 1, sequences);
  for(z = 0; z < sequences; z++ )  {
    solutions(i, z) = solution[z];
  }

  printf("finished NNLS on %s\n", filename);

  free(solution);
  free(trained_matrix_copy);
}


gdb always exits at a different point in my thread, so I can't figure out what is going wrong.

What I have tried:

-allocating a copy of each matrix, so that they would not be writing on top of eachother
-using a mixture of private/shared operators for the #pragma piece
-using different input sequences
-writing out my trained_matrix and count_matrix prior to calling NNLS, ensuring that they look OK. (they do!)

I'm sort of out of ideas. Does anyone have some advice?
MutantTurkey
 
Posts: 4
Joined: Mon Apr 29, 2013 1:00 pm

Re: OMP parallel for causes segmentation fault.

Postby MarkB » Wed May 01, 2013 3:04 am

Hi there,

A couple of possibilities:

1) The function nnls isn't thread-safe, and some race condition is occurring inside it. This can happen if, for example, there are variables inside the function which are declared as static, or declared at file scope, which will be shared between threads inside the parallel region. Do you have access to the source code for nnls, or is it from a library?

2) The worker threads are running out of stack space inside nnls. You can use the OMP_STACKSIZE environment variable to try increasing the stack size for worker threads.

One useful test would be to enclose the call to nnls in a #pragma omp critical construct and see if it works (which suggests a race condition) or not (which suggests the stack space issue).

Hope that helps,
Mark.
MarkB
 
Posts: 422
Joined: Thu Jan 08, 2009 10:12 am

Re: OMP parallel for causes segmentation fault.

Postby MutantTurkey » Wed May 01, 2013 6:43 am

I figured this out last night, turns out that a few erroneous static declarations of variables inside the NNLS function were causing the mishap!

Note to self, never use any code that was translated from fortran automatically ;)
MutantTurkey
 
Posts: 4
Joined: Mon Apr 29, 2013 1:00 pm

Re: OMP parallel for causes segmentation fault.

Postby MarkB » Wed May 01, 2013 7:47 am

Great, glad it's now working!
MarkB
 
Posts: 422
Joined: Thu Jan 08, 2009 10:12 am

Re: OMP parallel for causes segmentation fault.

Postby MutantTurkey » Wed May 01, 2013 11:28 am

Mark, I'm having issues still,

Removing all the static declarations solved the nnls problems as far as I can see, but now I get errors at this section:

This is part of the same loop, where we want to add our solution from each NNLS run to an array containing all the solutions:

Code: Select all
#pragma omp critical
for(z = 0; z < sequences; z++ )  {
    solutions[i*sequences + z] = solution[z];
}             


gdb isn't being very helpful either:

Code: Select all
Starting program: /home/calvin/quikr-c/multifasta_to_otu -i input/ -f /home/calvin/quikr/gg94_training_input.fasta -t gg94_trained.txt -o output -k 6 -l 10000 -j 1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffda557700 (LWP 27774)]
[New Thread 0x7fffd9d56700 (LWP 27775)]
[New Thread 0x7fffd9555700 (LWP 27776)]
executed "count-kmers -r 6 -1 -u input//sample=700015250.fa"
executed "count-kmers -r 6 -1 -u input//sample=700015268.fa"
executed "count-kmers -r 6 -1 -u input//sample=700015289.fa"
executed "count-kmers -r 6 -1 -u input//sample=700015009.fa"
running NNLS on input//sample=700015289.fa, 14, 4097
running NNLS on input//sample=700015268.fa, 21, 4097
running NNLS on input//sample=700015250.fa, 7, 4097
running NNLS on input//sample=700015009.fa, 0, 4097

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd9d56700 (LWP 27775)]
0x00000000004023c9 in main._omp_fn.0 () at multifasta_to_otu.c:169
169         solutions[i*sequences + z] = solution[z];
(gdb) print solutions
No symbol "solutions" in current context.
(gdb) print solution
No symbol "solution" in current context.
(gdb) print z
No symbol "z" in current context.
(gdb) print i
No symbol "i" in current context.
(gdb) print sequences
No symbol "sequences" in current context.
(gdb)


Apparently none of my variables exist to debug! :shock:
MutantTurkey
 
Posts: 4
Joined: Mon Apr 29, 2013 1:00 pm

Re: OMP parallel for causes segmentation fault.

Postby MutantTurkey » Wed May 01, 2013 12:41 pm

Well turns out I should malloc something larger than zero!

still having problems with nnls though, here is the source:

https://github.com/mutantturkey/quikr-c ... ter/nnls.c
MutantTurkey
 
Posts: 4
Joined: Mon Apr 29, 2013 1:00 pm

Re: OMP parallel for causes segmentation fault.

Postby MarkB » Thu May 02, 2013 2:34 am

MutantTurkey wrote:still having problems with nnls though, here is the source:


If you've taken out all the statics, I can't immediately see any other problems, I'm afraid.
MarkB
 
Posts: 422
Joined: Thu Jan 08, 2009 10:12 am

Re: OMP parallel for causes segmentation fault.

Postby ftinetti » Thu May 02, 2013 3:32 am

Hi,

Well turns out I should malloc something larger than zero!

still having problems with nnls though, here is the source:

https://github.com/mutantturkey/quikr-c ... ter/nnls.c


Maybe that's not the last version? I would like to play around a little bit with the code that uses OpenMP, would yo post it and/or update the links? I've seen
Code: Select all
char *usage = "Usage: quikr-train -i <fasta-file> -f <trained-database-fasta> -t <trained-database> -o <output-file> -k <kmer-size> -l <lambda>"

and maybe the input files are too large...?

Fernando.
ftinetti
 
Posts: 567
Joined: Wed Feb 10, 2010 2:44 pm

Re: OMP parallel for causes segmentation fault.

Postby jakub » Thu May 02, 2013 9:12 am

Note that e.g. readdir isn't reentrant nor thread-safe, so you can't use it in a parallel region safely.
jakub
 
Posts: 74
Joined: Fri Oct 26, 2007 3:19 am


Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 17 guests