How optimized this code using only pragma directive?

General OpenMP discussion

How optimized this code using only pragma directive?

Postby alessandro » Sat Jun 15, 2013 1:10 am

Convert RGB image to grayscale using CImg library

$ gcc -v
Thread model: posix
gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)

How said from the wiki:
OpenMP
for the above version is supported OpenMP 3.0.

I have already did some test as adding the follow code:
#pragma omp parallel for
for(int i=0;i<width;i++){
but naturally the result is wrong.
Then i tried using the ordered:
#pragma omp parallel for ordered
for(int i=0;i<width;i++){
for(int j=0;j<height;j++){
#pragma omp ordered
then work but I suppose that a similar use isn't the proper way for a well optimized multi-thread applications.

I'm already take a look at OpenMP v3.0 specification (pdf) but since is possible use differents approachs for the same issue and actually i don't understand sufficient well the flow of the code on a multithread applications I prefer ask here, on this way I can speed up the learning curve. :)
alessandro
 
Posts: 26
Joined: Tue Jun 11, 2013 2:14 am

Re: How optimized this code using only pragma directive?

Postby ftinetti » Sat Jun 15, 2013 3:19 am

Hi,

I paste the code here in order to avoid using the link:

Code: Select all
   /* Convert RGB image to grayscale image */
   for(int i=0;i<width;i++){
      for(int j=0;j<height;j++){

         //Return a pointer to a located pixel value.
         r = src(i,j,0,0); // First channel RED
         g = src(i,j,0,1); // Second channel GREEN
         b = src(i,j,0,2); // Third channel BLUE


         //PAL and NTSC
         //Y = 0.299*R + 0.587*G + 0.114*B
         gr1 = round(0.299*((double)r) + 0.587*((double)g) + 0.114*((double)b));

         //HDTV
         //Y = 0.2126*R + 0.7152*G + 0.0722*B
         gr2 = round(0.2126*((double)r) + 0.7152*((double)g) + 0.0722*((double)b));

         gray1(i,j,0,0) = gr1;
         gray2(i,j,0,0) = gr2;
      }

   }

I don't know why you used the clause "ordered"... anyway, what you have to identify in prospective parallel(izable) code are private/shared/reduction/etc. variables. In this specific case, the outer for is parallelizable taking into account that variables i, j, r, g, b, gr1, and gr2 should be private (each thread its own copy).

HTH,

Fernando.
ftinetti
 
Posts: 582
Joined: Wed Feb 10, 2010 2:44 pm

Re: How optimized this code using only pragma directive?

Postby alessandro » Sat Jun 15, 2013 5:49 am

Code: Select all
#pragma omp parallel for private(r,g,b,gr1,gr2,gray1,gray2)

I obtain the follow issue during the running:
Code: Select all
Errore di segmentazione (core dump creato)


Translate:
Segmentation fault (core dump created)
alessandro
 
Posts: 26
Joined: Tue Jun 11, 2013 2:14 am

Re: How optimized this code using only pragma directive?

Postby ftinetti » Sat Jun 15, 2013 3:42 pm

Hi,

CODE: SELECT ALL
#pragma omp parallel for private(r,g,b,gr1,gr2,gray1,gray2)

I obtain the follow issue during the running:
CODE: SELECT ALL
Errore di segmentazione (core dump creato)


Translate:
Segmentation fault (core dump created)

Include variable j in the list of private variables.

Fernando.
ftinetti
 
Posts: 582
Joined: Wed Feb 10, 2010 2:44 pm

Re: How optimized this code using only pragma directive?

Postby alessandro » Sat Jun 15, 2013 11:26 pm

Code: Select all
#pragma omp parallel for private(j,r,g,b,gr1,gray1)
   for(int i=0;i<width;i++){
      for(int j=0;j<height;j++){


During the compilation:
Code: Select all
error: ‘j’ has not been declared

while if i declare j even outside the <for>:
Code: Select all
int j;
#pragma omp parallel for private(j,r,g,b,gr1,gray1)
   for(int i=0;i<width;i++){
      for(int j=0;j<height;j++){

Again:
Code: Select all
Errore di segmentazione (core dump creato)
alessandro
 
Posts: 26
Joined: Tue Jun 11, 2013 2:14 am

Re: How optimized this code using only pragma directive?

Postby alessandro » Sun Jun 16, 2013 11:20 am

Using something like this:


Makefile
Code: Select all
CC = g++
CFLAGS = -fopenmp
LDFLAGS = -lX11 -lpthread
all: convert.cpp CImg.h
   $(CC) CImg.h convert.cpp $(LDFLAGS) -o grayscale
   @echo "Done"
noomp: all # just for test the use of this make feautures
omp: convert.cpp CImg.h
   $(CC) $(CFLAGS) CImg.h convert.cpp $(LDFLAGS) -o grayscale
debug: convert.cpp CImg.h
   $(CC) $(CFLAGS) -g CImg.h convert.cpp $(LDFLAGS) -o grayscale
profiling: convert.cpp CImg.h
   $(CC) -pg CImg.h convert.cpp $(LDFLAGS) -o grayscale


Code: Select all
#include "CImg.h"
#include  <iostream>
#ifdef _OPENMP
   #include <omp.h>
#endif /* _OPENMP */
using namespace cimg_library;
using namespace std;

int main() {

   //Construct image from reading an image file.
   CImg<unsigned char> src("img.jpg");

   int width = src.width();
   int height = src.height();
   int depth = src.depth();
   //New grayscale images.
   CImg<unsigned char> gray1(width,height,depth,1);
   
   unsigned char r,g,b;
   unsigned char gr1 = 0;
   
   clock_t clock_timer;
       double wall_timer;
   /* Convert RGB image to grayscale image */
   #pragma omp parallel for ordered
   for(int i=0;i<width;i++){
      for(int j=0;j<height;j++){
         //Return a pointer to a located pixel value.
         r = src(i,j,0,0); // First channel RED
         g = src(i,j,0,1); // Second channel GREEN
         b = src(i,j,0,2); // Third channel BLUE

         //PAL and NTSC
         //Y = 0.299*R + 0.587*G + 0.114*B
         gr1 = round(0.299*((double)r) + 0.587*((double)g) + 0.114*((double)b));

         #pragma omp  ordered
         gray1(i,j,0,0) = gr1;
      }    
   }
   std::cout << " time on clock(): " << (double) (clock() - clock_timer) / CLOCKS_PER_SEC;
   #ifdef _OPENMP   
      std::cout << " time on wall: " <<  omp_get_wtime() - wall_timer << "\n";
   #endif /* _OPENMP */          
   gray1.save("gray1.jpg");
   //show all images
   (src,gray1).display("RGB to Grayscale");

   return 0;
}

I removed few line on the initial code because for what I want to understand I don't need it.
One of my previous attempt was putting every code inside the nested loop on an ordered block now i noticed that is sufficient put gray1 so seem that the issue reside on gray1 but i don't understand the reasons and i don't like this solutions since defect the purpose of a multi-thread applications.

Also i'm problems at obtain a proper use of gdb for debugging purpose.
I'm looking at:
How to Debug Using GDB
work until fix the breakpoint but during the print and previously during the event of a core dump no.
alessandro
 
Posts: 26
Joined: Tue Jun 11, 2013 2:14 am

Re: How optimized this code using only pragma directive?

Postby ftinetti » Mon Jun 17, 2013 4:56 am

Hi,
a) About:

CODE: SELECT ALL
#pragma omp parallel for private(j,r,g,b,gr1,gray1)
for(int i=0;i<width;i++){
for(int j=0;j<height;j++){


During the compilation:
CODE: SELECT ALL
error: ‘j’ has not been declared

while if i declare j even outside the <for>:
CODE: SELECT ALL
int j;
#pragma omp parallel for private(j,r,g,b,gr1,gray1)
for(int i=0;i<width;i++){
for(int j=0;j<height;j++){


I apologize, my mistake: I didn't realize in my previous post that j is declared in the corresponding for... actually, the j of "int j" outside (before) the for is different from that of the for, the code runtime threaded code is finally almost the same.

b) I do not understand the reason/s why you initially had and still have segmentation faults... does the code run without any problem when you compile without the OpenMP compiler option?

c) I've not used GDB, so I'm not able to help/comment on that.

HTH,

Fernando.
ftinetti
 
Posts: 582
Joined: Wed Feb 10, 2010 2:44 pm

Re: How optimized this code using only pragma directive?

Postby alessandro » Mon Jun 17, 2013 6:55 am

ftinetti wrote:does the code run without any problem when you compile without the OpenMP compiler option?

No problem without use OpenMP.
alessandro
 
Posts: 26
Joined: Tue Jun 11, 2013 2:14 am

Re: How optimized this code using only pragma directive?

Postby chabachull » Mon Jun 17, 2013 9:13 am

I suggest that you use this line:

Code: Select all
    #pragma omp parallel for private(r,g,b,gr1,gr2)


You don't have to put gray1 and gray2 in the private clause.

When you put variables in the private clause, you mean for each thread to have its own copy of the variable.
Here, if you put gray1 and gray2 in the clause, it will be copied for each thread.
If they are too big, you might run out of memory for all the copies. Trying to access a copy which could not be allocated, you might stumble on a seg fault.

Also, it seems to me that you want for the different threads to modify cells in the same array (or pixel in the same image). So your strcture should be shared accross the threads (it is the default behavior). If it is private, each thread will only write in its own copy. At the end of the parallel region, the master thread will keep its own private copy, and all the work done by the other threads will be lost.
chabachull
 
Posts: 5
Joined: Mon Jun 17, 2013 3:02 am

Re: How optimized this code using only pragma directive?

Postby ftinetti » Mon Jun 17, 2013 9:48 am

@alessandro:
Ups... my mistake again, gray1 and gray2 should not be private. @chabachull has given the right list of private variables.

@chabachull:
When you put variables in the private clause, you mean for each thread to have its own copy of the variable.
Here, if you put gray1 and gray2 in the clause, it will be copied for each thread.
might stumble on a seg fault.

Take into account there is no data copy, just data allocation. Data copy is implied by the copyin clause.

Fernando.
ftinetti
 
Posts: 582
Joined: Wed Feb 10, 2010 2:44 pm

Next

Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 9 guests