Using ompP to Profile Large Parallel Programs

General OpenMP discussion

Using ompP to Profile Large Parallel Programs

Postby fearailhold » Mon Jun 16, 2008 8:55 am

Hello,

Currently I am working to parallelize a fairly large particle simulation program in C++ with OpenMP. When searching for profiling tools, I came across ompP (information can be accessed here) which looked quite helpful. Indeed, I threw together a quick TestMain.cpp file as follows:
Code: Select all
/* TestMain.cpp */
#include <omp.h>
#include <stdio.h>

int main()
{
    printf("Testing threads...\n");
    #pragma omp parallel
        printf("Hello from thread %d.\n", omp_get_thread_num());
}
/* end of file */


Compiling and running this with the ompP profiler yields an output file that correctly identifies the parallel regions in main() and provides useful analysis on where the program is spending its time.

However, once I use ompP with slightly less trivial code, problems arise. Indeed, it seems that the profiler only recognizes OpenMP directives that reside explicitly within main(). Let's use the following code as an example:
Code: Select all
/* Functions.h */
void testThreads()
{
    #pragma omp parallel
        printf("Hello from thread %d.\n", omp_get_thread_num());
}
/* end of file */

/* TestMain.cpp */
#include <omp.h>
#include <stdio.h>
#include "Functions.h"

int main()
{
    printf("Testing threads...\n");
    testThreads();
}
/* end of file */

With this code, the ompP profiler will not generate an output file. According to the usage guide, ompP starts profiling upon its first encounter of a parallel block. Placing "#pragma pomp inst init" at the beginning of main() will force the profiler to start immediately. Doing this in the above example does cause ompP to generate the following output file:
Total runtime (wallclock) : 0.00 sec [2 threads]
Number of parallel regions : 0
Parallel coverage : 0.00 sec ( 0.00%)

While the profiler does indeed recognize the correct number of threads in the program (TestMain was run with OMP_NUM_THREADS=2), it seems to completely miss the OpenMP directives in testThreads() and incorrectly reports that there are no parallel regions in the program.

Is anyone here familiar with ompP? Have you seen this problem before? I am wondering if this behavior is intentional; I feel like the profiler wouldn't be very useful if it only worked with programs that run solely from main().

The ompP webpage lists support for gcc 4.2.0 with a remark that it "should work for other platforms" as well. We are currently using gcc 4.1.2 - could this be causing the problem?
fearailhold
 
Posts: 2
Joined: Mon Jun 16, 2008 8:25 am
Location: San Antonio, TX

Re: Using ompP to Profile Large Parallel Programs

Postby ejd » Mon Jun 16, 2008 12:12 pm

I am not familiar with this profiler. However, the author Karl Fuerlinger (karl@cs.utk.edu) is quite nice and I am sure would be happy to help. The only problem is that he is moving from Tennessee to (I believe) Texas this summer and may be busy - so expect a small delay in getting an answer.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: Using ompP to Profile Large Parallel Programs

Postby fearailhold » Mon Jun 16, 2008 2:45 pm

ejd wrote:Karl Fuerlinger (karl@cs.utk.edu) is quite nice and I am sure would be happy to help.

Indeed, I emailed Dr. Fuerlinger and received a quick helpful reply. Once my issue is resolved I will post my solution here for anyone else who encounters the same problem in the future. To update my previous post in light of what he has told me thus far, I want to mention that ompP does indeed recognize OpenMP constructs outside of main(). I am thinking that our problem is related to our code being in header files instead of source files (sadly this is required for our application, which includes highly-templated code), but we will see.
fearailhold
 
Posts: 2
Joined: Mon Jun 16, 2008 8:25 am
Location: San Antonio, TX


Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot], Yahoo [Bot] and 9 guests