Threads that create other Threads

General OpenMP discussion

Threads that create other Threads

Postby da_rio » Mon May 19, 2008 6:52 am

Hello!
I'm a new user of OpenMP and I really like it!
Unfortunately I still have lot of troubles to deeply understand what exactly is going on when parallel threads are launched...

My problem is:
I need to solve two different systems Ax1=b1 and Ax2=b2.
I wrote two different solvers, one running on the CPU and the other running on the GPU. The solver running on the CPU is parallelized with OpenMP.
I have an 8-Cores Mac Pro and I was wondering if I can use 1 Thread (Core) to launch the GPU solver, and the others 7 to solve the second system on the CPU, all in parallel.

I tried this solution:

solve()
{
omp_set_num_threads(2);
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
solve_cpu();

#pragma omp section
solve_gpu();
}
}
}

The method solve_cpu() will call omp_set_num_threads(7) when it has to execute something in parallel, i.e. before #pragma omp parallel...
Unfortunately when I launch solve() my application doesn't respond anymore...

I tested both the CPU and GPU solvers independently but I can not run them "together".
I hope someone can help me to understand my situation!
Thanks in advance!!

Best,

dario
da_rio
 
Posts: 3
Joined: Mon May 19, 2008 6:35 am

Re: Threads that create other Threads

Postby ejd » Mon May 19, 2008 9:05 am

In theory you should be able to do this. I am afraid that I don't have the equipment to try and reproduce the problem to see what is going on. Could you please give more information about your setup - OS level, compiler (name and level), gpu you are trying to use and anything else that would be pertinent. Thanks.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: Threads that create other Threads

Postby da_rio » Tue May 20, 2008 12:16 am

Hello!
I just discovered the problem:

I'm working on a Mac Pro 8-Cores with MAC OS X 10.5 Leopard. For the CPU part I'm coding in C++, compiling with Intel Compiler ICC 10.1.012. For the GPU part I'm using NVIDIA CUDA and my graphics card is a NVIDIA GT 8800.

In order to measure the performance I first allocate the memory on the CPU and on the GPU and then I launch my multi-threaded computation. The problem is that the thread that initialize and allocate the GPU must be the same thread that drives the GPU kernel (the host). I can not initialize the GPU with one thread and let another one "hosting" the GPU computation.

Now my code is:

Code: Select all
solve()
{
    omp_set_num_threads(2);
    #pragma omp parallel
    {
        #pragma omp sections
        {
            #pragma omp section
            {
                initialize_gpu();
                solve_gpu();
            }

            #pragma omp section
            {
                solve_cpu();
            }
        }
    }
}


... and I can run my two solvers simultaneously!

Unfortunately I still have something wrong...
It seems that the thread responsible to launch the CPU computation doesn't create the others 7 threads needed to accelerate the solver.
The method solve_cpu() does something like this:

Code: Select all
solve_cpu()
{
    omp_set_num_threads(7);
    #pragma omp parallel ...
    {
        ...
        ...
    }
}


I was expecting to see all my 8-Cores and my GPU "on fire", but the profiler showed me just 2 cores running...
I hope you can help me!!

Best,

dario
da_rio
 
Posts: 3
Joined: Mon May 19, 2008 6:35 am

Re: Threads that create other Threads

Postby ejd » Tue May 20, 2008 5:23 am

Have you turned on "nested" parallel support (i.e., set the environment var OMP_NESTED to "true" or called omp_set_nested)? The default is "FALSE".
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: Threads that create other Threads

Postby da_rio » Tue May 20, 2008 8:55 am

That was the problem!
Thank you very much! It works smoothly now!

Little question about performance: does nesting deteriorate the performance of the application?
To solve the system on the GPU I need about 3 seconds, to solve the system on the CPU using 7 cores I need about 3.5 seconds. Running both solvers in parallel I was expecting to solve both systems in 3.5 seconds, but instead I need 5.5 seconds...

Is this due to the nesting? Or simply due to random cache stuff (I don't really know how the processor drives the computation on the GPU... maybe it needs lot of L2 cache deteriorating memory accesses to the other 3 threads of the same CPU...)?

Thank you again, hope somewhen I can return the favor!
Best,

dario
da_rio
 
Posts: 3
Joined: Mon May 19, 2008 6:35 am

Re: Threads that create other Threads

Postby ejd » Tue May 20, 2008 11:03 am

Nesting does have a little more overhead, since each parallel region has a barrier. However, it shouldn't be very much in your case. If the 7 cores take 3.5 seconds and the GPU only takes 3 seconds, you would expect that the thread that ran the GPU segment would be waiting at the barrier and it shouldn't take another 2 seconds for the barrier. Generally the added parallel overhead and barrier time is measured in milliseconds, so there is most likely something else going on. Unfortunately I don't know enough about how the GPU is driven either, so I am not really sure. You could try running a profiler on it and maybe it would give you an idea of where the extra time is being spent.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am


Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot], Yahoo [Bot] and 11 guests