[Omp] [RFC]: Enhancement Proposals for OpenMP (EPO)

Michael Suess mike_ml at suessnetz.de
Mon Sep 12 05:29:03 PDT 2005


Dear Colleagues,

I would like to introduce you to a couple of ideas of mine regarding OpenMP, 
and I would very much like your feedback on these. My doctoral thesis is 
presently centered around making parallel programming easier and I have 
chosen OpenMP as a starting point. I am now at the point, where I am playing 
around with ideas on how to improve its functionality, while at the same time 
retaining its superior ease of use. 

Since I do not want to do this "in the void", I am asking for your feedback 
here. I have prepared two "Enhancement Proposals for OpenMP" so far, more 
will probably follow. I have taken the "Python Enhancement Proposals (PEP)" 
as an orientation ( http://www.python.org/peps/), as the Python community is 
using these kinds of proposals to gather community feedback and to enhance 
their language quite successfully.

The first proposal deals with scheduling, or more precisely with the ability 
to influence the scheduler by yielding the processor or putting threads to 
sleep (scheduling.txt).

The second one is a little bigger and is about the ability to cancel parallel 
regions and the threads involved (thread-cancellation.txt).

They are both not that big, so I have attached them to this mail. I have also 
setup a short page describing my goals in a little more detail here: 
http://www.plm.eecs.uni-kassel.de/plm/index.php?id=ukomp

If you have some time, I would appreciate any feedback you can provide. Before 
I leave you alone, let me make one thing perfectly clear: This is the topic 
of my thesis, so technically I do not need the feedback. I need to write 
papers about my ideas though, and I would like these papers to be well 
thought out and with feedback from you included (if possible). I also do not 
want to pressure the ARB into accepting anything, these are merely ideas and 
I am presenting them here in the hope that they are useful. Of course I hope 
to stir up some discussion about OpenMP in the process though :-). 

If you want to play with my ideas, I can set you up with a special prerelease 
version of the OMPi compiler, into which all of the presented ideas are 
already implemented.

Thank you very much for your time and feedback,

Best Regards from Germany,
Michael Suess

P.S.: Does anybody know, if the feedback at openmp.org email address is still 
working? I sent a couple of questions there a while ago (twice) and have been 
waiting for an answer since then...
-------------- next part --------------
EPO: 2
Title: Scheduling
Author: Michael Süß <msuess at uni-kassel.de>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 17.08.2005
OpenMP-Version: 3.0
Post-History:


Abstract

    OpenMP currently does not give the programmer the power to put a
    thread to sleep or influence the scheduling process in any way.
    These capabilities are useful especially for testing purposes and
    to avoid busy waiting, and this proposal therefore suggests two
    simple additions to the OpenMP specification. These are the yield
    and the sleepuntil directive.


Terms

    There are times, when a thread has to wait for a condition to be
    met (example: a task being put into a taskpool) before it can
    continue. The most sparing way for the computing resources to
    achieve this, is to put the thread to sleep until the condition
    becomes true. Another way to wait for the condition is called
    "busy waiting", because the thread polls the condition until it
    becomes true, wasting computing resources in the process.
    Therefore busy waiting is best avoided, especially when dealing
    with embedded systems or when power consumption is an issue.


Specification

    This proposal suggests two new directives:

    #pragma omp yield
    Similar to the POSIX function sched_yield (), this function tells
    the scheduler to pick a new thread to run on the current
    processor. If no new thread is available, it returns immediately.
    It provides a simple way to pass on knowledge on what is
    important and what not at the moment from the programmer to the
    runtime system and operating system scheduler.

    The second proposed directive:

    #pragma omp sleepuntil (scalar-expression)
    This directive puts the current thread to sleep until the
    specified scalar expression becomes true. Flushes are carried out
    automatically while waiting on the expression to keep the
    temporary view of memory consistent with memory. The sleeping
    thread does not have to wake up immediately after the expression
    becomes true, nor does it have to wake up if the expression
    becomes true and becomes false again shortly afterwards. Not all
    threads waiting on the same expression have to wake up at the
    same time either. It is unspecified how many times any
    side-effects of the evaluation of the scalar-expression occur.


Motivation

    We have felt the need for both new directives multiple times.
    #pragma omp yield offers an easy to use way to influence the
    scheduling policies of the operating system. This can be important
    when computing resources are sparse and the programmer wants to
    optimize program output. An example of this would be calling
    the yield directive at the end of every pipeline step in a
    pipelined application to get values through the pipeline as fast
    as possible.

    A second use case for both directives is testing. When testing
    OpenMP compilers or performing unit tests for OpenMP programs,
    it is often useful to "force" the scheduler into certain patterns
    that could not be tested otherwise (e.g. stalling one thread,
    while all other members of the team go ahead and run into a
    barrier). This is not possible with the present OpenMP
    specification, and can be very useful to test for hard to catch
    errors. Example code to stall execution of one thread for 100
    milliseconds follows:

        double now = omp_get_wtime ();
        #pragma omp sleepuntil (omp_get_wtime () >= now + 0.1)

    And finally, the sleepuntil directive can be very conveniently
    used to avoid busy waiting, without the need to introduce
    complicated constructs like condition variables.


Rationale

    The yield-directive is inspired by its POSIX threads counterpart.
    We know of no primitive in any other parallel programming system
    that is as powerful AND easy to use as the proposed sleepuntil
    directive. It can be emulated by wasting time in a loop, but this
    would be busy waiting and wasteful to the available computing
    resources.


Backwards Compatibility

    The proposed changes are fully backwards compatible to the
    existing OpenMP specification.


Reference Implementation

    A reference implementation can be found in the next release of
    the Ompi Compiler [2] (or in a special prerelease version on
    request).


References

    [1] IEEE. Information Technology | Portable Operating System
        Interface (POSIX) | Part 1: System Application: Program
        Interface. IEEE/ANSI Std 1003.1, 1996 Edition.

    [2] Ompi http://www.cs.uoi.gr/~ompi/


Copyright

    This document has been placed in the public domain.



Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
-------------- next part --------------
EPO: 1
Title: Thread Cancellation
Author: Michael Süß <msuess at uni-kassel.de>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 13-Jun-2005
OpenMP-Version: 3.0
Post-History:


Abstract

    In this document changes to the OpenMP specification are proposed
    to allow for thread cancellation in parallel regions. Thread
    cancellation in this context means the ability to end execution
    of parallel regions prematurely for all threads in a team with a
    simple directive. This is often a needed capability e.g. in
    irregular parallel algorithms. This proposal uses C as base
    language, but all the suggested constructs can be applied to C++ /
    Fortran as well.


Terms

    On one hand, one talks about "forceful" cancellation, when a
    thread has the ability to cancel another thread from the outside.
    The cancelled thread may get the opportunity to clean up after
    itself, yet it does not have the power to decide when to be
    cancelled, nor to prevent cancellation at all. Asynchronous
    cancellation in POSIX Threads is an example of forceful
    cancellation.

    "Deferred" cancellation is an important subcase of asynchronous
    cancellation. The cancelled thread is not ended immediately, but
    only at certain predefined cancellation points. Deferred
    cancellation is supported in POSIX Threads as well.

    With "cooperative" cancellation on the other hand, a thread can
    only request the cancellation of another thread. The cancelled
    thread has the opportunity to decide to honor this request and
    cancel itself, or to process it at a later time, or even to
    ignore it altogether. Java supports cooperative cancellation.


Specification

    The following new directives to support thread cancellation in
    OpenMP are proposed:

    #pragma omp cancelregion:
    Asks all threads in the team to stop their parallel work and go
    to the end of the parallel region, where only the master thread
    will continue execution as usual. The emphasis here is on "asks".
    The threads in the team are not cancelled immediately, but merely
    a cancel flag is set for each of them (cooperative cancellation).
    An exception is the thread that called the directive: it is
    cancelled immediately by an implicit call of the exitregion
    directive (explained below). It is the task of the programmer to
    check if the cancel flag has been set, using a new OpenMP runtime
    library function:

    int omp_get_cancelled (void):
    This function returns 1 (true), if the cancellation of the
    enclosing parallel region was requested and 0 (false) otherwise.

    #pragma omp exitregion:
    This directive is not only useful for thread cancellation, but
    can be used at any point in a parallel region to immediately end
    the execution of the calling thread. This is accomplished by
    jumping to the end of the present parallel region, right into its
    closing implicit barrier (which is of course honored). Together,
    both new directives can be used in the following way to achieve
    thread cancellation:

    cancelling thread:
        #pragma omp cancelregion

    all other threads in the team:
        if (omp_get_cancelled ()) {

            /* this jumps directly into the closing, implicit barrier
             * at the end of the parallel region
             */
            #pragma omp exitregion
        }

    There is a problem with the proposal so far: barriers. If a
    region containing barriers is cancelled, at least one thread (the
    one calling the cancelregion directive) will never reach that
    barrier. Without further adjustment, one or more of the other
    threads in the region could hang in the barrier and never
    recover, since the barrier is never finished.

    A solution to this problem is proposed in the form of the
    oncancel clause for the barrier directive:

        #pragma omp barrier oncancel
        {
            /* this code is executed only when the region is
             * cancelled while the thread is waiting on the barrier
             */

            /* free thread resources */

            #pragma omp exitregion
        }

    As the example above shows, it is now possible to use barriers in
    combination with thread cancellation, without affecting backwards
    compatibility. It remains the task of the programmer though  to
    do "the right thing" when a thread waiting on a barrier is
    cancelled, although most of the time he will just free the
    resources associated with the thread and cancel the thread
    afterwards. Also note, that if he does not end the thread with
    exitregion, the thread will hang on the barrier again as soon as
    the oncancel scope has been finished (or phrased differently:
    there is an implicit barrier at the end of the oncancel clause).
    This is useful only in the case when the barrier is inside a
    nested parallel region and the cancel signal has been sent by an
    upper level thread (more on this later). The oncancel code is
    carried out at most once per barrier and thread. If the region
    is already cancelled when a thread enters a barrier, it will
    immediately proceed with the oncancel code.

    For implicit barriers (at the end of worksharing constructs) a
    similar construct is suggested, as shown below:

        #pragma omp for
        for (...) {

            /* for-loop code */

        }   /* implicit barrier at end of for loop */
        #pragma omp onbarriercancel
        {
            /* this code is executed only when the region is
             * cancelled while the thread is waiting on the implicit
             * barrier above
             */

            /* free thread resources */

            #pragma omp exitregion
        }

    The nowait clause and the onbarriercancel directive are mutually exclusive.
    The onbarriercancel directive can not be specified after a
    combined parallel worksharing construct (e.g.
    #pragma omp parallel for).

    Concerning nested parallelism: when a member of a team inside a
    parallel region encounters a new parallel construct, a new
    subteam is formed. Cancellation requests from inside the subteam
    will only cause members of the subteam to have their cancel-flag
    set. If another member of the original team requests cancellation
    however, the cancellation flags for all members of the subteam
    are set as well, although technically they are not in the same
    team.


Motivation

    The ability to cancel threads is one of the areas where OpenMP is
    lacking capabilities, compared to other parallel programming
    systems. Yet, for many parallel applications this is important,
    especially for irregular algorithms such as searching. The
    ability to react to user interruptions (e.g. someone pushing a
    cancel button) is not to be underestimated either.


Rationale

    Some of the suggested changes could be emulated manually by the
    experienced OpenMP programmer (such as keeping track of the
    cancel state of each thread). But this is an unnecessary burden
    and it gets difficult when barriers are involved at the latest.
    Therefore our proposal introduces the new directives and the
    omp_get_cancelled() runtime library call, as well as the
    additional oncancel clause for barriers.

    The exitregion directive is nothing more than a convenient
    shortcut, but even without thread cancellation it is useful as
    soon as one gets into deeply nested functions inside parallel
    regions.

    We have decided against forceful cancellation as in POSIX Threads
    [1], as asynchronous cancellation makes resource allocation
    practically impossible, since one never knows when a thread is
    cancelled. The concept of having cancellation points and deferred
    cancellation in OpenMP on the other hand seemed like overkill, as
    the amount of functions which are cancellation points is
    difficult to handle for programmers.  Therefore this proposal
    suggests cooperative cancellation, which can be found in a
    similar way e.g. in Java [2].

    A big problem with cooperative cancellation are the barrier
    constructs. The suggested solution (oncancel clause,
    onbarriercancel directive) may seem like a lot of overhead to
    cope with barriers, but the proposal is still easier and more
    natural than the possible alternatives (such as e.g. disallowing
    barriers with thread cancellation, letting the programmer take
    care of them, cancelling barriers forcefully).

    We have also decided against automatically including an
    exitregion directive at the end of an oncancel or onbarriercancel
    scope. The main reason for this is consistency, as
    automatically including the directive would cancel the threads
    waiting on barriers forcefully, which in turn would be
    inconsistent with the rest of the proposal.

    The reason for not allowing the onbarriercancel directive after
    combined parallel worksharing constructs is that the two main
    reasons for applying the directive are not valid after a
    combined directive. There is no need to take care of left over
    threads hanging in the implicit barrier at the end of the
    combined construct, as these threads are exactly where they would
    be if an exitregion clause were specified anyways. There is also
    no need to clean up any resources, as the programmer must have
    already done this before the end of the parallel region.

    During our internal discussions on the topic of thread
    cancellation, we have worked out a checklist, that each and every
    proposal we came up with had to pass. This checklist and some
    explanations of why our proposal passes it are spelled out here:

    1. Backwards Source Compatibility
    Old code must run unchanged, when translated with a compiler that
    understands thread cancellation. This is the case, as the
    behaviour of existing OpenMP-constructs is not changed, except by
    adding new clauses or directives. This includes complicated cases
    involving nested parallelism, e.g. when a program spawns a
    parallel region, which in turn calls a function from an "old"
    library, that was written at a time before thread cancellation
    was an issue. This library may then spawn a new parallel region
    (via nested parallelism). When a thread from the original
    parallel region is cancelled, the cancel flag for the library
    parallel region threads is set as well, yet the library region
    may include code that relies on the correct termination and
    cleanup of all included threads and must therefore not be
    cancelled. With our proposal, the threads from the library region
    are not cancelled (as they do not know about the possibility of
    cancellation at all) and can therefore clean up after themselves
    undisturbed. This would not be the case, if forceful cancellation
    was used.

    2. Nested Parallelism
    Each proposal must clearly state, how thread cancellation and
    nested parallelism play together. Our proposal does so, by
    declaring that when a parallel region is cancelled, all
    parallel regions that were created by a thread from the cancelled
    region have their cancel-flag set as well.

    3. Barriers
    Each proposal must cope with the case that a region is
    cancelled, while one or more threads are waiting on a barrier
    (including implicit barriers), without producing deadlocks. Our
    proposal does so with the introduction of the oncancel clause and
    the onbarriercancel directive.

    4. No Resource Leaks
    The programmer must have the option to free any resources he
    allocated, before a thread is cancelled. Our proposal takes care
    of this by advocating cooperative cancellation, where the
    programmer checks if a cancellation request has been put up and
    can therefore deallocate / free all of his resources before
    exiting from a thread. Even resource deallocation while waiting
    on barriers is allowed with the introduction of the new oncancel
    clause and onbarriercancel directive.

    5. C / C++ / Fortran Compatibility
    Each proposal must apply to all three supported languages of the
    OpenMP specification. Although our proposal (mostly) only spells
    out the C-syntax of the proposed changes, we believe that these
    are adaptable to C++ and Fortran as well.

    All of our proposal really is only useful if one goes beyond
    parallelizing simple loops and short regions of code. As soon as
    one starts to write irregular algorithms with OpenMP, we believe
    the proposed functionality can save a lot of time for the
    programmer and eliminate many sources of errors.


Backwards Compatibility

    The proposed changes are fully backwards compatible with the
    current OpenMP specification, and therefore all current programs
    would run without problems, if the change was included in a
    future OpenMP specification. No performance degrations are to be
    expected either.


Reference Implementation

    A reference implementation can be found in the next release of
    the Ompi Compiler [3] (or in a special prerelease version on
    request).


References

    [1] IEEE. Information Technology | Portable Operating System
        Interface (POSIX) | Part 1: System Application: Program
        Interface. IEEE/ANSI Std 1003.1, 1996 Edition.

    [2] Java 1.5 Documentation,
        http://java.sun.com/j2se/1.5.0/docs/index.html

    [3] Ompi http://www.cs.uoi.gr/~ompi/


Copyright

    This document has been placed in the public domain.



Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:


More information about the Omp mailing list