Michael Wong, CEO of OpenMP ARB, reflects on Supercomputing 13 and recent OpenMP advances:
I attended Supercomputing in my third year as OpenMP CEO to both represent IBM and OpenMP. This was a big year for us as we closed with many milestones in what I call a Significant Paradigm shift in Parallelism. The most significant milestone was that the OpenMP Consortium has released OpenMP 4.0 in 2013 with new parallelism features that are productive, portable, and performant across C, C++, and Fortran. OpenMP 4.0 contains significant additions for accelerators, standardized for a broad set of architectures, and an industry-first support for SIMD vectorization. It was being showcased at SC13.
The OpenMP ARB Consortium now has 26 members and is still growing, adding three new members in the last year:
- Red Hat/GCC
- Barcelona SuperComputing Centre
- University of Houston
Coming implementations of OpenMP 4.0 include GNU, and the Intel 13.1 compiler with support for accelerators. Clang has started with support with OpenMP 3.1.
Another major shift coming will be our mission statement which the members have been working on for a year now. It supports a move towards parallelism for a broader field, heterogeneous computing, and multiple memory models, while retaining strong support for shared memory and HPC.
At SC13, I gave the opening keynote at the OpenMP Birds of a Feather (BoF) session. (The BoF videos and slides are online.) In this session, I introduced the upcoming International Workshop on OpenMP (IWOMP) for 2014, which will be held in Salvador, Brazil, hosted by Cray and a local group that specializes in computer education in Brazil. The location promises to be wonderful and there is significant OpenMP interest in Brazil as well as South America in general. Although early, the conference will generally be in September coinciding with Spring in the southern Hemisphere. The cutoff for submissions will be in May 2014, so please be prepared. In general, we have moved IWOMP from June previously to September starting with this year in Canberra Australia. (My BoF presentation slides (PDF))
Our OpenMP staff ran the OpenMP show booth with many visitors inquiring about OpenMP which is used to support the current fastest Top-500 Supercomputer, the Tianhe 2.
The BoF showcased the 4.0 Specification by the Chair of the OpenMP Language committee LLNL’s CTO, Bronis de Supinski, as well as a humorous look at OpenWound by Oracle’s Ruud van De Pas. This was followed with a look at the Clang implementation of OpenMP and its status which is supported by an Intel open sourced runtime, by Intel’s Jim Cownie. This motivated a flurry of discussion about implementation status of compilers for OpenMP 4.0 and currently GNU and Intel already have significant implementation ready for release by mid 2014. Finally, we had a talk and demonstration of the HPC ToolsKit by Professor John Mellor-Crummey of Rice University on the upcoming Technical Report on A common Tools API for Profilers and Debuggers.
I also gave a 30 minute Exhibitor’s Forum talk on OpenMP 4.0, and the future of OpenMP to about 40 people. The main content of my message is that OpenMP is now more agile than ever. We did that with the introduction of a Technical Report process that enables us to publish work in progress and give users a look ahead and implementer a chance to verify feasibility. This is actually a fairly common facility in most Standards. We continue to do that now with more frequent face-to-face meetings, that is longer and enables more work to be done. Next, I showed what kind of future features we are preparing for the next release of OpenMP:
- OpenMP Tools: Profilers and Debuggers
- Consumer style parallelism: event/async/futures
- Enhance Accelerator support
- Additional Looping constructs
- Transactional Memory, Speculative Execution
- Task Model refinements
- CPU Affinity
- Common Array Shaping
- Full Error Model
- Rebase to new C/C++/Fortran Standards
This is clearly a very broad set of parallelism capabilities then what we have been traditionally been involved in. Well, we are moving forward too beyond our traditional involvement with shared memory parallelism. We have already started addressing non-shared memory architecture with accelerator support in 4.0. Now we are looking to reach beyond traditional High Performance Computing.
OpenMP is a living language, and as such will continue to grow. OpenMP is more agile, merging additional features such as a common tool support, more affinity, error model, interoperability with other models, new forms of loop parallelism, additional support for tasks, accelerators, event-driven programming, transactional memory, speculative execution and rebasing for new base language standards. The OpenMP revision cycle is increasing in speed and in predictability, while delivering concurrent technical reports and language extensions.
– Michael Wong (IBM)
A new article, “Full Throttle: OpenMP 4.0” by Michael Klemm, Senior Application Engineer, Intel and Christian Terboven, Deputy Head of HPC Group, RWTH Aachen University, appears in the current issue of Intel’s Parallel Universe magazine.
“Multicore is here to stay.” This single sentence accurately describes the situation of application developers and the hardware evolution they are facing. Since the introduction of the first dual-core CPUs, the number of cores has kept increasing. The advent of the Intel® Xeon Phi™ coprocessor has pushed us into the world of manycore— where up to 61 cores with 4 threads each impose new requirements on the parallelism of applications to exploit the capabilities of the hardware.
It is not only the ever-increasing number of cores that requires more parallelism in an application. Over the past years, the width of SIMD (Single Instruction Multiple Data) registers has been growing. While the early single instruction multiple data (SIMD) instructions of Intel® MMX™ technology used 64-bit registers, our newest family member, Intel® Advanced Vector Instructions 512 (Intel® AVX-512), runs with 512-bit registers. That’s an awesome 16 floating-point numbers in single precision, or eight double-precision numbers that can be computed in one go. If your application does not exploit these SIMD capabilities, you can easily lose a factor of 16x or 8x compared to the peak performance of the CPU.
To read the entire article, download the magazine in PDF. The article starts on page 6.
Intel’s Tim Mattson’s Introduction to OpenMP video tutorial is now available.
Thanks go to the University Program Office at Intel for making this tutorial available.
Michael Wolfe, at PGI, writes about programming standards for the next generation of HPC systems.
Having just returned from SC13, one burning issue is the choice of a standard approach for programming the next generation HPC systems. While not guaranteed, these systems are likely to be large clusters of nodes with multicore CPUs and some sort of attached accelerators. A standard programming approach is necessary to convince developers, and particularly ISVs, to start adoption now in preparation for this coming generation of systems. John Barr raised the same question in a recent article at Scientific Computing World from a more philosophical point of view. Here I address this question from a deeper technical perspective.
Read the complete article at »HPCWire.
Videos of the five in-booth talks and the Birds of a Feather session at Supercomputing 2013 (November 2013, Denver CO) are now »online.
The first release of the OpenMP 4.0 API Examples document is now available and can be downloaded from the Specifications page. This is a work in progress — additional examples are under development and will be released in later editions.
Also, a discussion forum for the 4.0 Examples document is now open.
The Clang/LLVM compiler now supports OpenMP 3.1
Linux Journal / Advanced OpenMP
In Online Journal Embedded
A portable OpenMP runtime library based on MCA APIs for embedded systems – Part 3
ACM Digital Library
Portable mapping of OpenMP to Multicore embedded systems using MCA APIs