[Omp] [RFC]: Enhancement Proposals for OpenMP (EPO)
Michael Suess
mike_ml at suessnetz.de
Mon Sep 12 05:29:03 PDT 2005
Dear Colleagues,
I would like to introduce you to a couple of ideas of mine regarding OpenMP,
and I would very much like your feedback on these. My doctoral thesis is
presently centered around making parallel programming easier and I have
chosen OpenMP as a starting point. I am now at the point, where I am playing
around with ideas on how to improve its functionality, while at the same time
retaining its superior ease of use.
Since I do not want to do this "in the void", I am asking for your feedback
here. I have prepared two "Enhancement Proposals for OpenMP" so far, more
will probably follow. I have taken the "Python Enhancement Proposals (PEP)"
as an orientation ( http://www.python.org/peps/), as the Python community is
using these kinds of proposals to gather community feedback and to enhance
their language quite successfully.
The first proposal deals with scheduling, or more precisely with the ability
to influence the scheduler by yielding the processor or putting threads to
sleep (scheduling.txt).
The second one is a little bigger and is about the ability to cancel parallel
regions and the threads involved (thread-cancellation.txt).
They are both not that big, so I have attached them to this mail. I have also
setup a short page describing my goals in a little more detail here:
http://www.plm.eecs.uni-kassel.de/plm/index.php?id=ukomp
If you have some time, I would appreciate any feedback you can provide. Before
I leave you alone, let me make one thing perfectly clear: This is the topic
of my thesis, so technically I do not need the feedback. I need to write
papers about my ideas though, and I would like these papers to be well
thought out and with feedback from you included (if possible). I also do not
want to pressure the ARB into accepting anything, these are merely ideas and
I am presenting them here in the hope that they are useful. Of course I hope
to stir up some discussion about OpenMP in the process though :-).
If you want to play with my ideas, I can set you up with a special prerelease
version of the OMPi compiler, into which all of the presented ideas are
already implemented.
Thank you very much for your time and feedback,
Best Regards from Germany,
Michael Suess
P.S.: Does anybody know, if the feedback at openmp.org email address is still
working? I sent a couple of questions there a while ago (twice) and have been
waiting for an answer since then...
-------------- next part --------------
EPO: 2
Title: Scheduling
Author: Michael Süß <msuess at uni-kassel.de>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 17.08.2005
OpenMP-Version: 3.0
Post-History:
Abstract
OpenMP currently does not give the programmer the power to put a
thread to sleep or influence the scheduling process in any way.
These capabilities are useful especially for testing purposes and
to avoid busy waiting, and this proposal therefore suggests two
simple additions to the OpenMP specification. These are the yield
and the sleepuntil directive.
Terms
There are times, when a thread has to wait for a condition to be
met (example: a task being put into a taskpool) before it can
continue. The most sparing way for the computing resources to
achieve this, is to put the thread to sleep until the condition
becomes true. Another way to wait for the condition is called
"busy waiting", because the thread polls the condition until it
becomes true, wasting computing resources in the process.
Therefore busy waiting is best avoided, especially when dealing
with embedded systems or when power consumption is an issue.
Specification
This proposal suggests two new directives:
#pragma omp yield
Similar to the POSIX function sched_yield (), this function tells
the scheduler to pick a new thread to run on the current
processor. If no new thread is available, it returns immediately.
It provides a simple way to pass on knowledge on what is
important and what not at the moment from the programmer to the
runtime system and operating system scheduler.
The second proposed directive:
#pragma omp sleepuntil (scalar-expression)
This directive puts the current thread to sleep until the
specified scalar expression becomes true. Flushes are carried out
automatically while waiting on the expression to keep the
temporary view of memory consistent with memory. The sleeping
thread does not have to wake up immediately after the expression
becomes true, nor does it have to wake up if the expression
becomes true and becomes false again shortly afterwards. Not all
threads waiting on the same expression have to wake up at the
same time either. It is unspecified how many times any
side-effects of the evaluation of the scalar-expression occur.
Motivation
We have felt the need for both new directives multiple times.
#pragma omp yield offers an easy to use way to influence the
scheduling policies of the operating system. This can be important
when computing resources are sparse and the programmer wants to
optimize program output. An example of this would be calling
the yield directive at the end of every pipeline step in a
pipelined application to get values through the pipeline as fast
as possible.
A second use case for both directives is testing. When testing
OpenMP compilers or performing unit tests for OpenMP programs,
it is often useful to "force" the scheduler into certain patterns
that could not be tested otherwise (e.g. stalling one thread,
while all other members of the team go ahead and run into a
barrier). This is not possible with the present OpenMP
specification, and can be very useful to test for hard to catch
errors. Example code to stall execution of one thread for 100
milliseconds follows:
double now = omp_get_wtime ();
#pragma omp sleepuntil (omp_get_wtime () >= now + 0.1)
And finally, the sleepuntil directive can be very conveniently
used to avoid busy waiting, without the need to introduce
complicated constructs like condition variables.
Rationale
The yield-directive is inspired by its POSIX threads counterpart.
We know of no primitive in any other parallel programming system
that is as powerful AND easy to use as the proposed sleepuntil
directive. It can be emulated by wasting time in a loop, but this
would be busy waiting and wasteful to the available computing
resources.
Backwards Compatibility
The proposed changes are fully backwards compatible to the
existing OpenMP specification.
Reference Implementation
A reference implementation can be found in the next release of
the Ompi Compiler [2] (or in a special prerelease version on
request).
References
[1] IEEE. Information Technology | Portable Operating System
Interface (POSIX) | Part 1: System Application: Program
Interface. IEEE/ANSI Std 1003.1, 1996 Edition.
[2] Ompi http://www.cs.uoi.gr/~ompi/
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
-------------- next part --------------
EPO: 1
Title: Thread Cancellation
Author: Michael Süß <msuess at uni-kassel.de>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 13-Jun-2005
OpenMP-Version: 3.0
Post-History:
Abstract
In this document changes to the OpenMP specification are proposed
to allow for thread cancellation in parallel regions. Thread
cancellation in this context means the ability to end execution
of parallel regions prematurely for all threads in a team with a
simple directive. This is often a needed capability e.g. in
irregular parallel algorithms. This proposal uses C as base
language, but all the suggested constructs can be applied to C++ /
Fortran as well.
Terms
On one hand, one talks about "forceful" cancellation, when a
thread has the ability to cancel another thread from the outside.
The cancelled thread may get the opportunity to clean up after
itself, yet it does not have the power to decide when to be
cancelled, nor to prevent cancellation at all. Asynchronous
cancellation in POSIX Threads is an example of forceful
cancellation.
"Deferred" cancellation is an important subcase of asynchronous
cancellation. The cancelled thread is not ended immediately, but
only at certain predefined cancellation points. Deferred
cancellation is supported in POSIX Threads as well.
With "cooperative" cancellation on the other hand, a thread can
only request the cancellation of another thread. The cancelled
thread has the opportunity to decide to honor this request and
cancel itself, or to process it at a later time, or even to
ignore it altogether. Java supports cooperative cancellation.
Specification
The following new directives to support thread cancellation in
OpenMP are proposed:
#pragma omp cancelregion:
Asks all threads in the team to stop their parallel work and go
to the end of the parallel region, where only the master thread
will continue execution as usual. The emphasis here is on "asks".
The threads in the team are not cancelled immediately, but merely
a cancel flag is set for each of them (cooperative cancellation).
An exception is the thread that called the directive: it is
cancelled immediately by an implicit call of the exitregion
directive (explained below). It is the task of the programmer to
check if the cancel flag has been set, using a new OpenMP runtime
library function:
int omp_get_cancelled (void):
This function returns 1 (true), if the cancellation of the
enclosing parallel region was requested and 0 (false) otherwise.
#pragma omp exitregion:
This directive is not only useful for thread cancellation, but
can be used at any point in a parallel region to immediately end
the execution of the calling thread. This is accomplished by
jumping to the end of the present parallel region, right into its
closing implicit barrier (which is of course honored). Together,
both new directives can be used in the following way to achieve
thread cancellation:
cancelling thread:
#pragma omp cancelregion
all other threads in the team:
if (omp_get_cancelled ()) {
/* this jumps directly into the closing, implicit barrier
* at the end of the parallel region
*/
#pragma omp exitregion
}
There is a problem with the proposal so far: barriers. If a
region containing barriers is cancelled, at least one thread (the
one calling the cancelregion directive) will never reach that
barrier. Without further adjustment, one or more of the other
threads in the region could hang in the barrier and never
recover, since the barrier is never finished.
A solution to this problem is proposed in the form of the
oncancel clause for the barrier directive:
#pragma omp barrier oncancel
{
/* this code is executed only when the region is
* cancelled while the thread is waiting on the barrier
*/
/* free thread resources */
#pragma omp exitregion
}
As the example above shows, it is now possible to use barriers in
combination with thread cancellation, without affecting backwards
compatibility. It remains the task of the programmer though to
do "the right thing" when a thread waiting on a barrier is
cancelled, although most of the time he will just free the
resources associated with the thread and cancel the thread
afterwards. Also note, that if he does not end the thread with
exitregion, the thread will hang on the barrier again as soon as
the oncancel scope has been finished (or phrased differently:
there is an implicit barrier at the end of the oncancel clause).
This is useful only in the case when the barrier is inside a
nested parallel region and the cancel signal has been sent by an
upper level thread (more on this later). The oncancel code is
carried out at most once per barrier and thread. If the region
is already cancelled when a thread enters a barrier, it will
immediately proceed with the oncancel code.
For implicit barriers (at the end of worksharing constructs) a
similar construct is suggested, as shown below:
#pragma omp for
for (...) {
/* for-loop code */
} /* implicit barrier at end of for loop */
#pragma omp onbarriercancel
{
/* this code is executed only when the region is
* cancelled while the thread is waiting on the implicit
* barrier above
*/
/* free thread resources */
#pragma omp exitregion
}
The nowait clause and the onbarriercancel directive are mutually exclusive.
The onbarriercancel directive can not be specified after a
combined parallel worksharing construct (e.g.
#pragma omp parallel for).
Concerning nested parallelism: when a member of a team inside a
parallel region encounters a new parallel construct, a new
subteam is formed. Cancellation requests from inside the subteam
will only cause members of the subteam to have their cancel-flag
set. If another member of the original team requests cancellation
however, the cancellation flags for all members of the subteam
are set as well, although technically they are not in the same
team.
Motivation
The ability to cancel threads is one of the areas where OpenMP is
lacking capabilities, compared to other parallel programming
systems. Yet, for many parallel applications this is important,
especially for irregular algorithms such as searching. The
ability to react to user interruptions (e.g. someone pushing a
cancel button) is not to be underestimated either.
Rationale
Some of the suggested changes could be emulated manually by the
experienced OpenMP programmer (such as keeping track of the
cancel state of each thread). But this is an unnecessary burden
and it gets difficult when barriers are involved at the latest.
Therefore our proposal introduces the new directives and the
omp_get_cancelled() runtime library call, as well as the
additional oncancel clause for barriers.
The exitregion directive is nothing more than a convenient
shortcut, but even without thread cancellation it is useful as
soon as one gets into deeply nested functions inside parallel
regions.
We have decided against forceful cancellation as in POSIX Threads
[1], as asynchronous cancellation makes resource allocation
practically impossible, since one never knows when a thread is
cancelled. The concept of having cancellation points and deferred
cancellation in OpenMP on the other hand seemed like overkill, as
the amount of functions which are cancellation points is
difficult to handle for programmers. Therefore this proposal
suggests cooperative cancellation, which can be found in a
similar way e.g. in Java [2].
A big problem with cooperative cancellation are the barrier
constructs. The suggested solution (oncancel clause,
onbarriercancel directive) may seem like a lot of overhead to
cope with barriers, but the proposal is still easier and more
natural than the possible alternatives (such as e.g. disallowing
barriers with thread cancellation, letting the programmer take
care of them, cancelling barriers forcefully).
We have also decided against automatically including an
exitregion directive at the end of an oncancel or onbarriercancel
scope. The main reason for this is consistency, as
automatically including the directive would cancel the threads
waiting on barriers forcefully, which in turn would be
inconsistent with the rest of the proposal.
The reason for not allowing the onbarriercancel directive after
combined parallel worksharing constructs is that the two main
reasons for applying the directive are not valid after a
combined directive. There is no need to take care of left over
threads hanging in the implicit barrier at the end of the
combined construct, as these threads are exactly where they would
be if an exitregion clause were specified anyways. There is also
no need to clean up any resources, as the programmer must have
already done this before the end of the parallel region.
During our internal discussions on the topic of thread
cancellation, we have worked out a checklist, that each and every
proposal we came up with had to pass. This checklist and some
explanations of why our proposal passes it are spelled out here:
1. Backwards Source Compatibility
Old code must run unchanged, when translated with a compiler that
understands thread cancellation. This is the case, as the
behaviour of existing OpenMP-constructs is not changed, except by
adding new clauses or directives. This includes complicated cases
involving nested parallelism, e.g. when a program spawns a
parallel region, which in turn calls a function from an "old"
library, that was written at a time before thread cancellation
was an issue. This library may then spawn a new parallel region
(via nested parallelism). When a thread from the original
parallel region is cancelled, the cancel flag for the library
parallel region threads is set as well, yet the library region
may include code that relies on the correct termination and
cleanup of all included threads and must therefore not be
cancelled. With our proposal, the threads from the library region
are not cancelled (as they do not know about the possibility of
cancellation at all) and can therefore clean up after themselves
undisturbed. This would not be the case, if forceful cancellation
was used.
2. Nested Parallelism
Each proposal must clearly state, how thread cancellation and
nested parallelism play together. Our proposal does so, by
declaring that when a parallel region is cancelled, all
parallel regions that were created by a thread from the cancelled
region have their cancel-flag set as well.
3. Barriers
Each proposal must cope with the case that a region is
cancelled, while one or more threads are waiting on a barrier
(including implicit barriers), without producing deadlocks. Our
proposal does so with the introduction of the oncancel clause and
the onbarriercancel directive.
4. No Resource Leaks
The programmer must have the option to free any resources he
allocated, before a thread is cancelled. Our proposal takes care
of this by advocating cooperative cancellation, where the
programmer checks if a cancellation request has been put up and
can therefore deallocate / free all of his resources before
exiting from a thread. Even resource deallocation while waiting
on barriers is allowed with the introduction of the new oncancel
clause and onbarriercancel directive.
5. C / C++ / Fortran Compatibility
Each proposal must apply to all three supported languages of the
OpenMP specification. Although our proposal (mostly) only spells
out the C-syntax of the proposed changes, we believe that these
are adaptable to C++ and Fortran as well.
All of our proposal really is only useful if one goes beyond
parallelizing simple loops and short regions of code. As soon as
one starts to write irregular algorithms with OpenMP, we believe
the proposed functionality can save a lot of time for the
programmer and eliminate many sources of errors.
Backwards Compatibility
The proposed changes are fully backwards compatible with the
current OpenMP specification, and therefore all current programs
would run without problems, if the change was included in a
future OpenMP specification. No performance degrations are to be
expected either.
Reference Implementation
A reference implementation can be found in the next release of
the Ompi Compiler [3] (or in a special prerelease version on
request).
References
[1] IEEE. Information Technology | Portable Operating System
Interface (POSIX) | Part 1: System Application: Program
Interface. IEEE/ANSI Std 1003.1, 1996 Edition.
[2] Java 1.5 Documentation,
http://java.sun.com/j2se/1.5.0/docs/index.html
[3] Ompi http://www.cs.uoi.gr/~ompi/
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
More information about the Omp
mailing list